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A.  TIGER  System 


1.  INTRODUCTION 


Rapid  increases  in  available  computational  power,  as  well  as  advances  in  algorithm  develop¬ 
ment,  have  facilitated  the  analysis  of  very  complicated  engmeering  problems.  These  enri- 
neering  problems,  however,  require  various  stages  of  anafysis  in  different  engineering  disa- 
plines,  such  as  flow  field  anjilysis  and  structural  anafysis,  in  order  to  validate  the  design.  These 
stages  are  traditionally  performed  separately  using  different  software  packages.  This  practice 
is  extremely  inefficient  and  time-consuming  and  do  not  contributes  the  quahly  of  the  results. 
Hence,  it  would  be  advantageous  to  couple  these  diverse  anal)^is  ^tems  into  a  seamless  pack¬ 
age,  which  could  be  used  by  a  design  engineer  without  speciali^  training  in  each  specific 
area. 

The  calculation  of  flow  fields  by  means  of  Computational  Fluid  Dynamics  (CFD)  techniques 
has  gained  a  lot  of  success  in  recent  years.  This  is  due  to  the  fact  that  the  accuracy  and  reliability 
of  CFD  calculations  have  been  improved  dramatically.  Currentfy,  it  is  a  common  practice  to 
use  CFD  to  help  in  validating  the  flow  field  associated  with  a  new  design.  There  are  three  ma¬ 
jor  stOTS  for  a  complete  CFD  simulation  cwcle.  They  are:  (i)  grid  generation  (pre-processing) 
(ii)  flow  solution  calculation  (processing),  and  (iii)  data  visual^tion  and  validation  (post¬ 
processing).  For  a  complicated  geometry,  such  as  a  missile  with  appendages,  the  grid  genera¬ 
tion  step  is  usually  a  labor-intensive  process,  and  it  generally  takes  more  ^an  half  of  the 
wall-dock  time  for  a  complete  CFD  simulation  cycle.  TIGER  [1, 2]  is  a  recently  developed 
system  which  drastically  speeds  tilie  grid  generation  process  for  such  configurations.  The  sec¬ 
ond  step,  flow  solution  calculation  however,  is  a  CPU-intensive  process.  It  usually  requires 
more  than  80%  of  the  computer  time  for  a  t^ical  CFD  application.  The  last  step  of  the  CFD 
^cle  is  to  anafyzeA^date  the  data  obtained  from  the  second  step.  This  step  is  neither  labor- 
intensive  as  the  grid  generation  step,  nor  as  CPU-intensive  as  tiie  second  step. 

The  processing  stage  is  generally  performed  on  a  supercomputer,  while  the  other  stages  are 
usually  execute  on  a  lo^  grapmcs  workstation.  This  implies  that  there  are  common  actions 
(i.e.  data  transferal,  data  manipulation,  boundary  condition  setup,  etc.),  whichauser  must  per¬ 
form  to  use  the  separate  paclmges.  These  actions  are  usually  highfy  tedious  and  time-oon- 
sun^.  Moreover,  the  heavily-loaded  supercomputers  generally  have  long  lists  of  jobs  wait¬ 
ing  in  &e  queue,  wUdi  may  result  in  a  long  and  unbearable  turn— around  time,  not  to  mention 
that  supercomputers  are  usuall^not  avail^le  to  the  design  engineer  in  a  small  to  medium  size 
enterprise.  Therefore,  how  to  unprove  the  effidency  of  engineering  applications  and  how  to 
utilize  the  available  resources  become  important  issues  that  need  to  be  addressed. 

Recent  advances  in  computer  hardware  and  architectures  have  promoted  the  development  of 
parallel  computing  in  a  distributed,  heterogeneous  environment  Such  an  approadi  a^ows  the 
decomposition  of  Imge  engineering  problems  that  require  intensive  floatingpoint  calculations 
into  smaller  “blocks”,  each  of  which  can  be  handled  by  a  workstation.  &i^eering  problems 
assodated  with  CFD  calculations  require  intensive  floating  point  calculation.  For  a  detailed 
flow  simulation,  it  usually  requires  a  supercomputer  to  cal^ate  the  governing  partial  differ¬ 
ential  equation,  known  as  the  Navier— Stokes  equations.  By  breaking  down  the  domain  into 
smaller  blocks  and  by  allowing  each  blodc  to  communicate  through  a  networl^  we  can  use  a 
duster  of  available  workstations  to  calculate  the  flow  solution,  visualize  file  result  etc.  in  a  dis¬ 
tributed  sense  wifiiout  using  supercomputers.  However,  this  distributed  approach  requires 
transferring  eaqilidt  messages  between  me  cooperating  worl^tions.  Keepmg  track  of  these 
messages  is  usually  a  diffic^t  and  error  prone  task. 

Network  Distributed  Global  Memoiy,  NDGM  [3],  developed  at  Army  Research  Laboratoiy 
(ARL)  is  a  shared  memoiy  model.  NDGM  is  implemented  on  top  of  a  message  passing  layer 


called  MRS,  Message  Relay  System.  It  allows  an  application  to  copy  sections  of  local  memory 
to  and  from  the  global  address  space.  This  model  also  allows  users  to  invoke  any  other  pro¬ 
cesses  which  may  then  access  the  data  from  the  shared  memory. 

NDGM  has  been  implemented  in  the  flow  solver  DZONAL  [4],  also  developed  at  ARL>  along 
with  a  simple  visuali^tion  tool,  Bop  View,  in  this  NDGM  enmonment  Tms  environment  al¬ 
lows  a  user  to  integrate  any  number  of  networked  workstations  to  contribute  to  the  overall 
memory  and  share  the  load  of  the  calculation. 

The  grid  generation  code,  TIGER  [1, 2],  is  developed  at  Mississippi  State  University  Engi¬ 
neering  Research  Center.  This  code  is  a  customized  code  for  axisymmetric  configurations. 
It  is  capable  of  generating  grids  for  turbomachinery  and  missile  configurations  in  a  fraction  of 
the  time  that  is  required  when  using  general-purpose  grid  code. 

An  overall  goal  of  this  project  is  to  establish  an  environment  which  can  be  utilized  for  a  wide 
spectrum  of  physical  modeling  and  analysis  disciplines.  An  initial  goal  is  to  integrate  the  key 
elements  of  a  CFD  application  (TIGER,  DZONAL,  and  Bop  View)  to  build  a  comprehensive 
missile  simulation  ^tem  will  perform  grid  generation,  flow  solution  calculation,  and  solution 
visualization  utilizing  tiie  NDGM  shared  memory,  heterogeneous  environment  The  utiliza¬ 
tion  of  a  cluster  of  heterogeneous  workstations  ^ows  all  available  resources  to  be  allocated 
as  needed.  The  cumulative  power  of  such  system  should  be  sufficient  for  even  the  most  com¬ 
putation-intensive  problems.  Multi-disdplinary  analysis  capability  can  be  added  in  a  modu- 
lar  fasMon  into  this  model  as  desired.  The  close  coupling  of  the  components  and  central  con¬ 
trol  ly  the  graphical  user  interface  should  provide  significant  productivity  improvement  verses 
current  practice. 

In  acomplete  CFD  cycle,  grid  generation,  flow  solution  calculation,  and  solution  visualization/ 
validation  are  the  three  major  steps.  Traditionally,  these  steps  are  usually  performed  indepen¬ 
dently.  therefore,  a  lot  of  precious  time  and  effort  is  wasted  due  to  the  improper  data  transfer¬ 
ring,  inconsistent  data  format,  and  repeatedly  tedious  solution  monitoring  dming  the  initial 
stage  of  flow  parameter  setup.  Integration  of  existing  softwares  as  modules  into  a  common 
environment  Is  a  promising  approaeh  to  reduce  the  potential  for  errors  In  these  steps  of  a  CFD 
application.  A  comprehensive  simulation  system  should  at  least  include  the  grid  generation 
moditie,  flow  solution  calotiation  module,  and  visualization  module.  These  modules,  though, 
need  to  be  couples  to  some  degree,  and  should  be  independent  of  one  another  so  that  any  of 
them  may  be  replaced  in  a  timefy  fashion  for  eitiier  performing  different  tasks,  or  for  simpfy 
updating  the  algorithms. 

The  grid  generation  process  is  current^  performed  in  a  user-friendty  graphical  environment 
However,  to  generate  a  grid  for  a  missile  configuration,  it  may  still  take  days  or  even  weels 
of  effort  using  a  general  purpose  grid  generation  code,  since  the  user  muri  hand-carve  each 
single  boundary  and  manipulate  the  sunaces  manually.  TTGE^  a  (^omized  grid  generation 
code  tiiat  is  originally  developed  for  turbomadiineiy  applications,  is  an  appropriate  choice  to 
be  tiie  grid  generation  module  of  this  integrated  missile  simulation  qrstem  smce  it  has  b^n 
optimized  for  axiymmetric  configurations.  It  has  proved  that  it  reduces  the  grid  generation 
efrbrt  by  ah  order  of  magnitude  for  complicated  turbomachineiy  geometries,  as  compared  to 
the  effort  required  when  using  a  general  purpose  grid  code. 

During  the  initial  stage  of  running  the  flow  code  for  a  configuration,  it  is  traditional  for  an  engi¬ 
neer  to  run  the  flow  code  for  a  smaU  number  of  iterations  on  a  supercomputer,  transfer  the 
data  back  to  the  local  host,  and  invoke  a  flow  solution  visuallmtion  software  to  check  if  the 


parameters  and  boundary  conditions  are  set  property.  Such  a  process  may  iterate  several  times 
before  the  engineer  is  confident  enough  to  make  a  long  run  of  the  flow  code.  This  process 
however,  is  tedious  not  only  because  the  user  needs  to  wait  for  the  queue  on  file  supercomput¬ 
er,  but  also  because  each  operation  takes  time,  such  as  transferring  ^e  huge  data  set  back  from 
the  remote  supercomputer.  Using  a  cluster  of  workstations  on  the  local  network  to  share  the 
load  of  sudi  cmculation  reduces  load  on  the  supercomputer  and  hence  speeds  the  turn  around 
time.  Moreover,  using  the  NDGM  shared  memory  scheme,  a  user  not  only  can  distribute  the 
calculation  to  the  cooperating  workstations,  but  also  can  invoke  various  processes,  such  as  a 
visualization  module,  that  accesses  the  data  from  the  same  shared  memory  as  tiie  flow  solver. 
This  reduces  the  effort  of  transferring  the  data  explidtty  through  the  network. 

The  proposed  s]^tem  will  be  built  in  the  shared  memory  environment,  with  different  softwares 
in  different  engineering  disdplines  being  integrated  into  the  ^em  as  modules.  These  mod¬ 
ules  will  be  easy  to  replace  by  similar  so^ares  for  easy  update.  The  shared  memory  environ¬ 
ment  should  also  be  studied  to  improve  its  efficiency.  Software  in  different  disciplines  need 
to  be  coupled  so  that  the  communications  of  different  modules  can  be  smooth  and  fiiey  can 
be  added  iuto  the  qrstem  as  desired.  Such  a  system  will  significantly  inorease  the  productivity 
for  solving  problems  with  the  utilization  of  available  local  resources. 

2.  RESEARCH  OBJECTIVES  &  MOTIVATION 

In  a  conylete  CFD  (Computational  Fluid  Dynamics)  cycle,  geometry-grid  generation,  exe¬ 
cution  of  flow  solver  (numerical  solution  of  non  linear  partial  differential  equations  pertinent 
to  fluid  phenomena  to  be  simulated),  and  solution  visualization/validation  are  the  three  major 
steps.  Haditionalty,  these  steps  are  performed  inde^ndentty.  Therefore,  a  significant 
amount  of  predous  time  and  effort  is  wasted  due  to  the  improper  data  transferring,  inconsis¬ 
tent  data  format,  and  repeatedly  tedious  solution  monitoring  during  the  initial  stage  of  flow 
parameter  setup.  Also,  geomet^— grid  generation  (pre-processing),  and  solution  visualiza¬ 
tion  ^st-processing)  are  performed  on  graphical  workkations.  However,  the  flow  solver 
is  executed  on  the  mainframe  or  a  supercomputer  in  most  cases.  Integration  of  existing  soft¬ 
ware  for  these  functionalities  as  modmes  into  a  common  distributed/parallel  computing  envi¬ 
ronment  is  a  promising  approach  to  reduce  an  overall  <ycle  time  and  the  potential  errors  in 
the  aforestat^  steps. 

An  overall  objective  for  this  project  is  to  develop  a  comprehensive  simulation  ^em  tailored 
for  missile  configurations  by  integrating  grid  generation,  flow  solution  and  visualization  as 
software  modules.  These  modules,  though,  neM  to  be  coupled  to  some  de^ee,  should  be  in- 
dependentof  one  another  so  that  any  ofthemmay  be  replacki(or  updated)  inatimety£ashion. 
The  grid  generation  module  should  be  enhanced  for  solution  adaptivity  and  for  effident  and 
accurate  geometry  treatment. 


3.  TECHNICAL  APPROACH 

The  development  of  sudi  an  integrated  system  in  distributed  computing  environment  is  a  for¬ 
midable  task.  The  approach  adapted  in  ^  project  calls  for  utilization  of  existing  modules  for 
eMdeniy.  In  this  regard: 

•  Grid  Generation:  The  TIGER  system  developed  at  the  Mississippi  State  Univer¬ 
sity  (MSU)  is  utilized  for  grid  generation.  The  TIGER  system  is  a  grid  generation 


software  customized  for  axisymmetric  three-dimensional  configurations.  A  de¬ 
tailed  description  of  the  TIGER  system  Is  provided  In  Appendix  A.1 . 

•  Flow  Solver:  At  Initial  stage  the  computer  code  PARC,  developed  at  Arnold  En¬ 
gineering  Development  Center,  for  simulating  thin  layered  Navler  Stokes  equa¬ 
tions  is  used  for  flow  calculations. 

•  Solution  Visualization:  The  visualization  module  BopVlew  of  the  TIGER  system 
is  used  as  visualization  module. 


•  Distributed  Environment  The  Netwoit  Distributed  Global  Memory  (NDGM)  sys¬ 
tem  developed  at  the  Army  Research  Laboratory  is  a  shared  memory  model  Im¬ 
plemented  on  top  of  a  message  passing  layer  called  MRS  (Message  Relay  Sys¬ 
tem).  This  system  Is  utilized  as  a  computing  platform  to  accomplish  distributed 
processing. 

•  Geometry  and  CAD  Interface:  The  computer  code  CAGI  (Computer  Aided  Grid 
Interface)  developed  at  MSU  Is  utilized  for  geometry/surface  preparations. 

•  AdaptSd:  The  adaptive  grid  system  under  development  at  MSU  is  used  for  grid 
adaption  and  redistribution. 

TTiedeveloi^mentofthis  integrated  system  is  accomplished  in  following  four  phases:  (i)  POP 
(Proof  of  Pnnciple)  ^tem  —  the  POP  system  is  accomplished  by  considering  the  TIC®R  as¬ 
tern  as  the  basis.  Tlie  grid  generation  and  flow  visualization  modules  of  the  TIGER  are  inte¬ 
grated  with  the  PARC  system  to  simulated  missile  flow  in  an  integrated  fashion.  A  network 
module  to  allow  data  communication  between  heterogeneous  environment  based  on  TCP/IP 
communication  protocols  is  developed  for  data  transfer.  This  ^^em  should  be  considered  as 
a  testbed  to  demonstrate  the  feasibility  of  an  overall  integration  system. 

(ii^  Grid  Captation  and  CAGI  system:  the  grid  adaptive  algorithms  and  CAGI  (Computer 
Aided  Grid  Interface)  system  modules  should  be  enhanced  to  facilitate  geometrical  entities 
encountered  in  genersd  missile  configurations.  These  modules  should  be  enhanced  and  pre¬ 
pared  for  the  automatic  indusion  in  the  integrated  ^tem. 

(iii)  l&ihancements  in  Grid  adaptive  algorithms:  the  grid  adaptive  algorithms  should  be  en¬ 
hanced  to  improve  the  effidency,  robustness  and  accuracy  in  view  of  parallel/distributed  envi- 
ro^ent  The  grid  adaptive  module  in  AdaptSd  is  based  on  the  grid  movement  (keep  the  grid 
points  fixc^)  pertinent  to  solution  features.  The  equi-distribution  prindple  is  applied  in  high¬ 
er  dimension  to  establish  distribution  oriteria.  An  appropriate  blendmg  of  algebraic  and  ellip- 
tic  systems  are  utilized  to  accomplished  wel}  distributed,  smooth  and  near  orthogonal  adaptive 
^d.  The  enhancements  must  address  parallelization  of  fiie  multiblock  elliptic  solver  utmz^ 
in  the  grid  adaptive  ^em. 


(^)  ,CFD“NDGM  S3^em:  the  final  phase  is  to  bring  all  modules  together  and  inco: 
them  in  the  NDGM  environment  This  work  is  to  be  accomplished  in  collaboration  with 
researdiers. 


4.  SIGNIFICANT  ACCOMPLISHMENTS 

The  first  three  phases  of  this  integration  project  have  been  accomplished.  However,  the  last 
phase  has  not  been  accomplished  at  the  level  where  all  the  modules  operate  in  the  CFD-NDGM 
environment  very  coherently. 

The  integrated  testbed  system  using  TIGER  code  has  been  completed  and  validated  utilizing 
missile  configurations  of  interest  to  Army  Research  Laboratoiy  (ARL).  The  pre  and  post 
processing  steps  associated  with  the  integrated  system  can  be  executed  on  an  SGI  workstation 
However,  the  flow  solver  can  be  executed  on  an  SGI  Workstation.  The  system  has  been 
successfully  implemented  in  the  existing  NDGM  System  at  ARL.  A  five  block  erid 
configuration  associated  with  the  Wrap  around  missile  and  the  resulting  execution  speed  on  the 
power  challenge  array  obtained  by  ARL  researchers  are  presented  in  figure  1-2.  The  CFD 
simulation,  in  parallel,  associated  with  multiple  duct  engine  configurations  were  performed  with 
the  mtegrated  TIGER  System.  The  geometrical  configuration  along  with  the  speed-up  obtained 
are  presented  in  the  figures  3-4.  The  Tiger  code  description  is  provided  in  Appendix  A. 


The  NonUniform  Rational  BSpline  (NURBS)  based  grid  generation  technology  has  been 
enhanced  to  improve  grid  quality  in  view  of  optimizing  smoothness  and  orthogonality  by 
redistributing  interior  NURBS  control  points.  A  fast  NURBS  curve-surface  evaluation 
algorithm  developed,  under  NASA/MSFC  contract  has  been  incorporated  for  evaluation  of 
qarametric  NURB  entities. 

The  detailed  technical  discussions  on  this  development  is  reported  in  the  following  publication: 

•  Shih,  M.H.,  Stokes,  M.L.,  Huddleston,  D.  and  Soni,  B.K.,  ‘Towards  in  integrated 
CFD  System  on  a  Parallel  Environment,”  Parallel  Computational  Huid  Dynamics: 
Implementations  and  Results  Using  Parallel  Computers,  A.  Eccer,  J.  Periaux,  N. 
Satofuka  and  S.  Taylor  (Eds.),  pp.  437-445,  AIAA-95-2338, 1995. 

This  publication  is  enclosed  in  the  technical  publication  section  5.1. 

The  grid  adaptive  and  CAGI  (Computer  Aided  Grid  Interface)  modules  have  been  enhanced  to 
efficiently  address  missile  configurations.  All  functional  modules  are  now  ready  to  be 
implemented  on  the  overall  NDGM  framework.  This  work  is  being  accomplished  by  working 
side-by-side  with  ARL  researchers.  The  detailed  results  and  the  testbed  integrated  system  has 
been  transferred  to  ARL. 

New  weight  functions  employing  boolean  sums  is  developed  to  represent  local  truncation  errors 
for  adaptation.  The  adaptive  procedure  redistributes  mesh  points  based  on  existing  flow  fields. 
The  technique  has  been  applied  to  a  missile  configuration  (associated  with  KTA2-12  project) 
where  flow  field  involves  supersonic  freestream  conditions  at  angle  of  attack,  large  scale 
separated  vortical  flow,  vartex-vortex  and  vartex-surface  interactions  separated  shear  layers 
and  multiple  shocks  of  different  intensity.  The  results  have  demonstrated  the  capability  of 
adaptive  aforestated  complicated  flow  feating. 

The  technical  discussions  associated  with  this  development  reported  in: 

•  Thornburg,  H.,  Soni,  B.K.  and  Boyalakuntla,  K.,  “A  Structured  Based 

Solution-Adaptive  Technique  for  Complex  Separated  Flows,”  Journal  of  Applied 
Mathematics  &  Computation,  to  appear. 

This  paper  is  enclosed  in  section  5.2. 


Figure  1.  Fire  Block  Grid-Wrap  Around  Missile 
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A  Parallel  Multiblock  Eliptic  Grid  Generator  for  structured  grids  in  any  complex  topoloev  has 
been  designed.  The  features  of  this  code  are: 

•  Each  block  is  run  on  an  individual  processor  with  load  balancing  done  by  threads  on 
multi-processor  shared  memory  machines. 

•  Code  designed  to  have  minimum  cache  misses  for  efficient  operation  on  shared 
memory  machines. 

•  MPI  used  as  the  message  passing  library. 

•  Parallel  I/O  as  far  as  possible. 

•  Solution  based  adaptive  control  of  grids 

•  Movement  of  body  surface  points  using  NURBS. 

Tests  on  a  5  block  grid  (~1.3  million  points)  around  a  missile  fin  have  shown  speedup  of  6.72 
times  using  7  processors  of  an  8  processor.  This  configuration  is  presented  in  figure.  5. 

SGI-ONYX 


Figure  5.  Multiblock  Grid  Around  Missile-Fin. 


A  detailed  technical  discussions  on  this  development  are  reported  in  the  following  two 
publications. 

•  “Parallel  Solution-Adaptive  structured  Grid  Generator  for  complex  multiblock 
topologies”.  Proceedings  of  the  International  conference  on  parallel  and  distributed 
processing  techniques  and  applications,  1997. 

•  “Parallel  Adaptive  Grid  Generation  for  structured  multiblock  domains”  Master’s 
thesis,  Mississippi  State  University,  May,  1997. 

These  publications  are  included  in  section  5.3  and  5.4  respectively. 

^1  these  modules  have  been  transferred  to  the  researchers  at  ARL.  These  modules  will  be 
incorporated  in  the  NDGM-DICE  environment  at  ARL.  The  successful  development  of  this 
system  will  provide  significant  productivity  improvement  associated  with  CFD  simulation 
involving  complex  configurations. 


5.0  Technical  Publications 

5.1  “Towards  An  Integrated  CFS  System 

parallel  environment”,  AIAA-95-2338 
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Towards  an  Integrated  CFD  System  in  a  Parallel  Environment 

M.  Shih*, 

M.  Stokes'*', 

D.  Huddleston’, 

B.Soni® 

ABSTRACT 

This  paper  describes  the  strategies  applied  in  parallelisation  of  CFD  solvere  in  a  p^lelto- 
tributed  mem^  environment  using  a  message  passing  paradigm.  To  ««>“tor  the  ^uMion  pro- 
^s  ^direct  r^dering  via  the  OpenGL  graphics  API  is  investigate  ^d  shown  to 
performance  Acustomized,  integrated  CFD  simulation  system,  TIGER,  tailored  for  turbomachiMry 
Mnfigurations,  executable  in  a  distributed  environment  is  developed,  ^mputetional  ecamplw  dem- 
onstSingthe  successful  parallelization  of  the  NPARC  code  with  rel-t«ne  v^ualization  and  the  TI¬ 
GER  system  are  presented.  Future  expansion  to  this  environment  is  discussed. 

1.  INTRODUCTION 

The  primaiy  objective  of  the  National  Science  Foundation’s  Engineering  R^A^n^ 
(ERC)  at  Mississippi  State  University  is  to  reduce  the  man/machme  resoi^  requured  to  i^rfoim 
S^utational  Field  Simulation(CFS).  The  approach  to  reducing  the  machme  resource  is 
1)  dtsien  of  specialized  hardware  such  as  multi -computers  with  veiy  low  laten<y  commumcations, 
and  2)  &e  de^gn  of  distributed  computing  environments  to  take  advantep  of  heterogenw^  ^ec- 
tions  of  personal  computers,  workstations,  and  supercomputers  if  aimlable.  The 
ing  the  empower  resource  is  primarily  through  the  development  of  advanced  software  tools  which 
efficiently  generate  and  edit  grids,  and  visualize  the  solution  field. 

Classically,  the  process  of  CFS  is  thought  of  as  a  pre-procwsing  phaM  of  grid  generation,  the 
solution  phase  and  a  ^-processing  phase  comprised  of  visuahzation.  However,  examination  p 
this  process  in  a  production  environment  Ulustrates  thew  phases  are  not  sparate 
exist  in  at  least  binary  pairs.  For  example,  grid  generation  requires  visualization  of  the 
the  ran^ruction  p^re  to  be  effective.  Grid  generation  requires  one  or  more  iterations  of  the  field 
solution  to  adequately  capture  viscous  phenomena  and  shocks.  The  solution  prpess  also  requ^  ^ 
sualization  to^etennine  the  accuracy  of  boundary  and  ruri-time 

problems,  the  solution  phase  may  require  an  integrated  gnd  generation  capabihty  to  provide  solution 

adapti^ty  d^  this  problem,  the  ERC  formed  the  Integration  Thrust  to  investigate  the 
integratedenvironmeiLfor  the  CFS  process.  Conceptually,  “  “‘^Srateden^ronmentwoi^^u^ 
or^^lify  the  logistics  of  moving  between  phases  and  therefore  reduce  wall  clock  time  as  we 
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reduce  the  expertise  r^ired  to  perform  simulations.  Early  testbeds  for  integrated  ^tems  revealed 
that  simple-minded  integration  of  existing  grid  generation,  solver,  and  visualization  tools  were  im¬ 
practical,  and  would  have  limited  longevity  due  the  volatile  nature  of  current  day  tools.  These 
testbeds  resulted  in  two  conclusions  which  coincide  with  near-term  and  far-term  goals,  respective¬ 
ly^ 

1.  The  solver  set-up/execution  phase  and  real-time  visualization  can  be  integrated  in  a  distributed 
environment  using  off-the-shelf  technology. 

2.  Grid  generation  codes  must  be  redesigned  to  support  distributed  memory  environments  and  sup¬ 
port  object-oriented  concepts  and  therefore  are  not  currently  mature  enough  for  integration. 

Concentration  in  this  paper  is  placed  upon  the  solver  set-up/executation  phase  and  the  real¬ 
time  visualization. 

The  casual  observer  may  wonder  why  visualization  must  be  distributed  and  why  it  should  be 
real-time.  The  anwer  to  the  first  question  is  that  current  trends  in  CFS  point  to  using  a  distributed 
memory  parallel  environment  for  large  scale  applications[l.  ].  With  simulations  routinely  contain¬ 
ing  more  than  a  million  nodes,  memory  requirements  for  graphic  workstations  would  overwhelm 
even  the  most  current  hardware,  thus  the  distributed  architecture  is  required  to  mass  enough 
memory  to  contaiii  the  problem.  The  real-time  features  are  require  to  monitor  early  or  transient 
phases  of  the  solution  to  minimize  loss  of  computing  resource  due  to  incorrect  boundary  or  run-time 
parameters.  Note  that  what  is  needed  here  is  not  publication  quality  graphics,  but  rather  graphics 
which  illustrate  faults  in  the  solution  process. 

The  progress  reahzed  in  the  development  of  the  aforestated  integrated  environments  for  the 
CFS  process  is  presented  in  this  paper.  In  particular,  the  development  associated  with  following  issues 
are  described: 

i.  ParaUelizingthe  widely  utilized  off-the-shelf  CFD  ^stem.  The  NPARCcode[2.  ]  was  chosen 
for  this  research  effort  due  to  it’s  availability  to  US  research  and  academic  communities,  and 
that  it  is  heavily  used  in  US  Federal  Labs  and  industry.  Though  the  NPARC  code  has  been 
parallelized  by  private  companies  such  as  BoeingfS.  ],  no  parallel  version  was  available  for 
general  distribution  as  of  the  time  of  this  writting.  The  NPARC  code  was  a  natural  choose 
for  pa^lelization  because  of  its  multi— block  structured  design.  This  aspect  is  discussed  in 
detail  in  the  following  section. 


ii.  Real  time  visualization  in  a  a  distributed  environment.  Here,  an  architecture  for  indirect 
rendering  using  OpenGL  graphics  library  is  described 

iii.  Customized  integrated  CFD  simulation  process  for  a  class  of  configurations.  An  integrated 
^stem,  TIGER,  customized  for  turbomachinery  application  is  described. 

2.  THE  NPARC  SOLVER 

The  NPARC  solver  started  life  as  a  derivative  of  the  ARC[7]  suite  of  codes  from  NASA  Ames, 
but  gained  popidanty  after  being  heavily  modified  by  Cooper[2]  at  Arnold  Engineering  Development 
Center^DC)  m  Tullahoma,  TN.  under  the  name  PARC,  or  Propulsion  ARC  code.  The  NPARC  code, 
a  close  denvative  of  PARC,  solves  either  the  Reynolds-averaged  Navier-Stokes  equations  or  the  Eul¬ 
er  equations,  subjert  to  user  specification.  NPARC  uses  a  standard  central  difference  algorithm  in 
flux  calculation,  with  Jameson[9]  style  artificial  dissipation  applied  to  muinfflin  stability.  NPARC 
has  two-time  integration  algorithms  available  including  Pulliam’s[7]  diagonalized  Beam  and  Warm- 
^approa^te  factorization  algorithm  or  a  multi-stage  Runge-Kutta  integration.  The  diagonal- 
algorithm  was  applied  herein.  Among  other  characteristics,  NPARC  utilizes  conservative  metric 
o^eren<^g,  offers  loc^  time— step  options,  and  provides  a  generalized  boundaiy  treatment.  The 
c^e  canbe  applied  in  either  a  single  or  multi— block  mode.  This  results  in  acode  which  is  quite  gener¬ 
al  ^  ®PPly  to  complex  configurations.  The  NPARC  multi— block  code  supports  onfy  one  grid 

block  m  memory  at  a  time,  caching  the  other  blocks  to  disk.  The  main  loop  would  write  out  the  current 
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block  (if  not  the  first  time),  read  in  and  process  the  second  block,  and  the  cycle  would  repeat.  Similari- 
ly,  boundaiy  conditions  were  handled  by  files  which  were  subsequently  read  by  a<}}acent  blocks  once 
in  memory.  This  design  made  it  quite  simple  to  produce  a  parallel  code  from  this  original  design. 

The  PANARC,  or  Parallel  National  ARC  code,  as  coined  at  the  ERC,  took  the  original  design 
and  actuall  simplified  it.  Block  caching  was  eliminated  such  that  each  process  only  operated  on  one 
block,  and  the  boundary  condition  routines  replaced  the  previous  I/O  calls  with  PVMflO]  send/receive 
calls.  Boundaiy  and  run-time  conditions  are  defined  in  a  FORTRAN  namelist  file  and  were  not  al¬ 
tered  from  the  original  format.  However,  each  process  only  reads  what  is  pertinent  for  each  respective 
block. 

To  simplify  the  run-time  logistics,  PANARC  assumes  each  process  can  access  the  same  set  of 
input  files,  which  under  the  UNIX  Operating  System,  is  accomplished  by  NFS.  PVM  was  choosen  for 
the  interprocess  communication  library,  over  the  then  immature  MPI[11]  libary  because  MPI  lacks 
support  for  dynamic  process  management.  However,  it  is  apparent  that  from  commercial  industry 
support  among  a  large  class  of  workstation  and  supercomputer  vendors,  MPI  will  be  the  preferred 
choice  for  the  message  passing  interface  of  the  future.  Features  lacking  now  are  expected  to  be  in¬ 
cluded  in  future  versions  of  MPI. 

Memory  in  the  NPARC  code  is  allocated  in  the  main  program  as  a  sin^e  work  array.  The  only 
change  to  support  indirect  rendering  was  to  replace  the  DIMENSION  statement  with  a  C  language 
call  that  allocates  shared  memoiy  (see  Figure  1).  This  change  allows  any  other  process  (with  proper 
permissions)  to  access  the  main  memory  used  by  the  solver.  It  is  through  this  mechanism  that  the 
visualization  module  gains  access  to  the  field  variables. 

3.  ARCHITECTURE  FOR  INDIRECT  RENDERING  USING  OPENGL 

The  premise  for  the  design  for  this  ^stem  is  the  concept  of  domain  decomposition,  which  relies 
on  the  solution  domain  being  broken  down  into  (for  this  case)  equal  size  domains,  each  domain  being 
mapped  to  a  remote  computer.  To  minimize  data  transfer  during  the  visualization  phase,  the  render¬ 
ing  calls  are  issued  on  local  data  rather  than  transfering  the  data  to  the  display  host  (which  was  not 
an  option  since  the  graphics  workstation  did  not  have  enough  memoiy  to  store  the  data  in  the  first 
place).  This  concept  relies  on  the  Method  of  Extracts!  12]  which  suggests  that  what  is  being  visualized 
is  a  subset  or  extraction  dimensionally  lower  than  that  of  the  original  data.  In  terms  of  solution  moni¬ 
toring,  this  is  almost  always  the  case. 

A  design  contraint  was  for  the  visualization  scheme  be  as  unobtrusive  as  possible  to  the  simula¬ 
tion  process  as  well  as  minimize  code  modification  to  the  solver.  In  addition,  the  visualization  capabil¬ 
ity  must  be  attachable/detachable  at  any  time  during  the  simulation.  The  process  toplogy  that  evolved 
is  shown  in  Figure  1. 

In  this  topology,  visualization  is  facilitated  as  a  separate  process  on  each  host.  Communication 
between  the  solver  and  visualization  process  on  each  processor  is  through  shared  memoiy.  Optional 
features  in  this  topology(&hown  in  dashed  lines)  are  (1)  the  existence  of  a  solver  process  on  the  display 
host  (not  recommended  for  large  scale  visualization),  and  (2)  enabled  message  passing  between  a  visu¬ 
alization  client  on  the  display  host  and  the  solver  clients.  The  latter  option  may  be  convenient,  but 
generally  is  not  recommended  in  that  it  violates  the  requirement  that  the  solver  and  visualization  pro¬ 
cesses  be  independent. 

In  the  current  model,  the  visualization  client  running  on  the  display  host  has  the  responsibity 
of  creating  the  shared  window,  acting  on  mouse  and  button  events(which  are  not  shared),  clearing  the 
vmous  buffers  before  rendering  begins,  setting  the  projection  matrix,  and  Eynchronizing  the  swap¬ 
ping  of  the  back  display  buffer.  It  may  also  have  the  responsibility  of  rendering  information  for  a  do¬ 
main  if  the  user  has  elected  to  use  the  display  host  as  an  active  node  in  the  simulation.  The  visualiza¬ 
tion  clients  on  the  remote  processors  render  to  the  shared  window  on  the  display  host  through  the 
multi— threaded  Xll  Server,  not  the  visualization  client  running  on  the  display  host.  This  greatly 
simplifies  the  programming  aspects  of  distributed  applications. 

Communications  between  the  visualization  clients  are  primarily  one-way,  that  is  flowing  to¬ 
ward  the  display  hosts.  The  OpenGLI]  calls  are  tokenized  on  the  remote  clients  and  sent  to  the  Xll 
Server  running  on  the  display  host  using  the  GLX  extension  provided  by  Silicon  Graphics.  Thus,  the 
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Figure  1.  Process  Topology  of  Indirect  Rendering 

programmer  can  think  of  indirect  rendering  simply  as  providing  addition  input  streams  into  the  ge¬ 
ometry  engines  on  the  display  host.  Since  this  operation  is  often  performed  unsynchonized  thes<^ 
conversion  may  not  be  unique  since  these  tokens  may  arrive  unordered  to  the  server.  This  is  not  gen¬ 
erally  a  problem,  because  most  operations  are  not  dependent  on  the  ordering.  However  if  alpha  1^- 
enngffor  example)  is  enabled  with  background  blending,  then  the  ordering  of  the  operations  are  im- 
portant  and  care  must  be  taken. 

1t  h  often  necessary  to  communicate  information  from  the  display  host  to  the  remote  dients 
Three  diffeiwt  methods  are  described  herein  with  the  choice  of  methods  dependent  on  the  needs  of 
the  apphcation.  If  the  information  is  ^obal  in  nature,  then  X  Wndows  provides  the  mechanism 
Imown  as  properties,  which  are  shared  among  local  and  remote  processes  shm^a  common  window. 
If  My  client  attempts  to  change  a  global  properly,  then  a  PropertyChangeEvent  is  sent  to  aU  dients’ 
wluch  respond  wito  a  request  to  read  the  new  property  and  thus  synchronize  the  data.  If  point- to- 
^mt  commiMcation  is  required,  then  a  simple  method  is  to  post  a  window  (such  as  Motif  or  Xt  win¬ 
dow)  on  the  displ^  processor  where  the  user  can  use  graphical  techniques  such  as  buttons  or  sMers 
to  conv^information.  A  third  technique  is  to  use  client  deCmed XEendEuents  that  support  ihepass- 
of  arbitraiy  information  between  local  or  remote  processes.  Care  must  be  exercised  when  ex- 
cha^ing  data  between  heterogenous  machines,  as  data  incompatibilities  may  arise.  This  problem 
can  be  avoided  if  the  data  is  enooded/decoded  using  the  publidy  available  XDR  routines. 

\^en  double  buffered  graphics  are  required  on  the  display  host,  it  is  the  responsibility  of  the 

o^erofthevwndow(thevisualizationprocess  on  the  display  host)toactuallysw^thedisplaybuffers 

^r  the  remote^ents  have  completed  their  rendering  operations.  To  perform  this  operation,  the 
local  ^ent  must  have  two  pieces  of  information:  (1)  the  number  of  remote  clients  sharing  the  window 
and  (2)  notification  when  all  clients  have  completed  their  operation.  To  satisfy  (i),  the  visualization 
cue^  simply  ClierUMessage  telling  the  client  on  the  display  host  that  a  new  remote  client  is 

registenng  Itself.  Thelocalclient  simply  keeps  track  of  the  number  of  registered  clients.  This  in¬ 
formation  coupled  wth  the  XSynchronization  Extension  to  X  Wndows  allows  the  client  on  the  dis¬ 
play  host  to  maintain  a  synchronized  vrindow. 

4.  TIGER  SYSTEM 

The  devdopment  of  TIGER  has  evolved  from  a  grid  generation  Qrstem  into  an  integrated  cys- 
^Q)e<aal^inappHcationsmintatingturf>o-ma^  The  integrated  ^tem  is  comprised  of 
6  modules:  (0  ^d  gene^on  module,  (w)  visualization  module,  (ui)  network  module,  (to)  flow  solu- 
on  module,  (o)  simulation  module,  and  (oi)  toolbox  module.  Elach  module  may  be  accessed  indepen- 
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igurc  2.  Graphical  User  Inierface  of  TIGER 
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panels.  1  ho  global  panel,  where  the  logo,  message  browser  ''‘'•''ous  sub-panels  and  auxiliaiy 

functmn  buttons  reside,  is  the  main  panel  of  theiTre  GU  other  global 

left-hand  upper  corner  of  the  global  panel  and  is  the  w^nHnw  n  sits  in  the 

the  solution.  Sub-  panels  such  as  the  "OVU”  na„cl  Tn  nn  ‘he  user  to  view  the  grid  and 

panel  as  the  algorithm  requests  user  inputs.  ThLe  pa’nds  oroWde  G«obal 

lar  group  of  user  inputs.  Auxiliary  panels  such  as  the  widgets  for  a  particu- 

els,  pop  up  only  upon  the  user’s  r^uest  for  paramew  c.^?  pan- 

ti":r\“r2:JreTto';~ 

network  module  as  the  data  bridge  that  conveys  u.C*s  ^  ^  is  to  use  the 

local  graphics  workstation  and  a  remotesupercomputer  r  '’otween  the 

erver  sub  module,  which  resides  in  TIGER  as  a  function  -i  of  two  sub-modules:  theclierat- 

.sasetof  routines  thatresideindependentfyonthc.^mrmaf^^^^ 

ule.  This  code  applies  the  thin-layer  aDn^oximit!!f  ^  ^  o  as  the  flow  solution  mod- 

solve  the  Reynolds-averaged  Navier-StXs  StionT  ^  f  turbulence  model  to 

flux  Jacobians  evaluated  by  flux-vector-snlitt^n.,  finite  volume  scheme  with 

Standard  LU  factorization  to  enhance  the  code's  stahilitv 

technique  115-161  to  the  buffer  sonrbetw^ the  ^  e^d  distortion 

compressible  flow.  It  performs  flow  eele.!ift^on  on  thelnl!”'^®  ^  achieve  time-accurate  solution  for 
on*2^  *nterval,  whose  length  is  specified  by  the  user  The^wd^*^^!’  ?“tputs  the  flow  solution 
on  the  screen  graphically  through  the  ex^utionrf  vil.«l.w  ^  '®  automatically  updated 
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methods,  such  as  wireframe  and  Gouraud  shading, 
and/or  solution  can  be  animated  d3rnamically. 


Time- dependent  grid 


contour  and  vector  field. 


Figure  3.  Tiger  Network  Topology  Figure  4.  Simulation  Module  Setup 


5.  COMPUTATIONAL  EXAMPLES 

“‘deration  of  the  distributed  paraUel  environment  with  distributed 
al^Uon,  a  four  blo^3D  test  case  was  performed  using  a  four  block  16mb  SGI  Indigo  Elanc 
workstations  on  a  W  mb  Ethernet  network(Figure  5  shows  the  block  configuration  on  the^ertical 
QTnmetiy  plane).  The  ^t  case  is  an  mternal  engine  configuration  with  blades  removed.  Figure  6 
a  SIX  frame  sequence  of  pressure  on  a  vertical  symmetry  plane  as  captured  directly  from  the  screen. 

.  .  by  network  traffic  is  almost  nedieible  To  further 

mmi^  thie  effect,  the  X  server  on  the  display  host  is  capable  of  caching  display  li^  c^ted  on  ei- 

retiWe^md  wk  ‘“/““I  is  that  objects  that  have  hot  changed  are  not 

retransferred  across  the  network,  thus  local  redraws  even  for  complex  objects  are  veiy  fast. 

.  .  For  toegwen  test  ^es,  the  latency  created  by  network  traffic  is  almost  negligible  TV>  further 

tihSre^S;!^®  r  ‘‘ispl^  host  is  capable  of  caching  displa^liS  creLdS  ei- 

not  dients .  The  impact  of  this  caching  is  that  objects  that  have  not  changed  are 

not  retransfered  across  the  network,  thus  local  redraws  even  for  complex  objects  are  veiyfi^ 


6.  FUTURE  WORK 

and  effort  ^tyiUustrates  the  need  for  an  integrated  desktop  tool  for  the  execution 

-fh”  ^  desktop,  while  generally  not  adding  any  new  capabilify,  aids  the 

a  commm*r  autoina^  functions,  or  simplifying  the  use  of  the  tools  by  bringing  ^n  together  in 

acommonGraphicalUserlnterfacefGUI).  Afewofthesetoolsarelistedbelo^ 

Graphical  based  inputonm  parameters- 

Cunratly,  NE^  reads  a  FORTRAN  namelist  input  file  for  information  regaiding  the 
^  fame  and  b^dary  conditions.  A  valuable  aid  to  the  user  would  be  to  pa- 
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I-  igure  7.  Grid  for  SR- 7  Figure  8.  Density  contour  for 

*  *’°  ^^*  the  user  to  indicate  boundary  conditions  bypf!!n<-a«f/- 

<  If)  ther  ebv  combining  the  editing  process  with  graphical  validation  of  the  input. 

An  automatic  mesh  decomposition  tool- 

This  feature,  given  a  set  of  rules  customizahl,.  by  th.-  user,  would  automaticallv  decompose 
an  aibitiai  V  set  of  structured  grid  blocks  into  the  number  of  equal  sized  gi-id  blocks  comen- 
sui  ale  with  the  number  of  available  proce.s.sors  in  tlie  distributed  environment  This  tool 
m-Ms  i  !o  information  of  the  blocks,  would  be  capable  of  breaking  structureli’ 

bi  erf- ,  c'^istinK  grid  boundaries  assuming  the  grid 

Visualization  monitoring  tool- 

riii.s  would  allow  the  user  to  giaphically  monitor  the  state  of  a  running  simulation  using  the 

round'sha^-d^n  1'"'^'  -P'''>'<-‘s«ing  Packages  such  as  line  or  ga- 

nnh^a  ^  ^  ^  '•‘-•‘-•tor  plots|5|,  streamlines,  and  partide 

pathsM.  I  to  name  a  few.  I  he  interface  would  provide  feature  detection  such  as  monitoring 

UvJ  The "’assivc  rfcfeets  in  total  temperature  or  pressure,  to  nam<. 
two.  1  he  ability  to  monitor  global  as  well  as  Iwal  gi  id  convergence  at  a  glance  would  provide 
instant  information  on  the  health  of  the  simulation.  ^ 

Performance  monitoring  of  the  distributed  parallel  system - 

Ihis  feature  would  allow  the  user  to  monitor  the  performance  of  the  distributed  system 
S^dSor  '•-‘‘"'-'dthoreforeonly require merginginto 

we  look  fomard  into  the  future  of  CFS  as  defined  by  the  requirements  for  multi-discinlin 
aiy  optimization  and  analysis(MDO&A),  one  can  define  requirements  th“st  L  pTesenU 
gcnci  ation  systems  to  provide  adc-quate  functionality  in  an  integrated  system.  Theso’^includo  ^ 

^  changes  in  either  geometry  or  topology.  Most  current  grid  pro- 

grams  either  don  t  support  editing,  or  do  it  very  poorly. 

2.  Clo^ly  related  to  1,  grid  components  must  remember  and  maintain  design  constraints  dnr 
mg  the  editing  process.  Without  default  editing  methods,  1  would  be  SS. 

process  must  not  bo  required  to  have  intimate  knowledge  of  the 
is  altefiS^^^bTr*^^  example,  if  the  location  of  a  bezier  control  point  attached  to  a  line 
^fnt  K  “""ceted  to  the  edge  of  a  surface,  the  procedure  that  requests  the 

^int  to  move  ne^  not  be  required  to  specify  the  series  of  steps  necessary  to  reconstruct  the 
surface.  That  information  should  be  associated  with  the  surface  itself. 
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4  The  editing  process  must  not  be  linked  uniquely  to  the  GUI.  Anyone  or  any  process  should 
be  able  to  execute  an  edit,  not  just  the  indivual  vfith  the  mouse. 

6.  EkUts  should  occur  in  parallel. 

6  Grids  should  be  constructed  in  a  collaborative  environment.  Many  individuals  should  be 
able  to  work  on  components  of  a  compUcated  grid  simultaneously. 

7  Grids  components  should  be  constructed  relative  to  arbitrary  defined  rob-ordinate  coordi- 
■  nate  systems  which  are  free  to  relocate  in  3  space.  This  construct  would  be  an  extension  of 

a  grouping  operator  in  which  the  geometic  operators  of  translation,  scalmg,  and  rotation 

are  defined. 
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ABSTRACT 

A  structured  grid  based  technique  is  presented  to  enhance  tiie  predictive  capability  of  widely  used 
CFD  codes  through  the  use  of  solution  adaptive  gridding.  The  procedure  redistributes  mesh  points 
based  upon  an  existing  flowfield.  A  newly  developed  weight  function  employing  Boolean  sums  is 
utilized  to  rq)resent  the  local  tnmcation  error.  This  weightfraction  is  then  used  toconstruct forcing 
functions  associated  with  an  elliptic  system:  A  NURBS  i^resentation  is  employed  to  define  block 
surfaces  for  boundary  point  redistribution.  The  technique  has  been  applied  to  a  flowfield  about  a 
configurationofpracticaiinterest.  Thisflowfieldinvolvessupereonicfreestteamconditionsatangle 
of  attack,  andexhibits  largie  scale  sq)arated  vortical  flow,  vortex-yprt^  and  vortex-surfaceinterac- 
tions,  sqjarated  shear  layers  and  multiple  shocks  of  different  intensity.  The  results  demonstrate  the 
capability  of  the  devdoped  weight  function  to  detect  shocks  of  differing  strengths,  primary  and  sec¬ 
ondary  vortices,  and  shear  layers  adequately. 

INTRODUCTION 

Rapi^d  access  to  hi^y  accurate  data  about  complex  configurations  is  needed  for  multi-disdplinaiy 
optimirMtion  and  design.  In  order  to  efficiently  meet  these  requirements  a  closer  coupUng  between 
the  analysis  algorithms  and  tire  discretization  process  is  needed.  In  some  cases,  such  as  flee  surface, 
temporally  varying  geometries,  and  fluid  structure  interaction,  tiie  need  is  unavoidable.  Ih  other 
cases  the  need  is  to  rapidly  generate  and  modify  higjiqnahty  grids.  Tbcfamqiies  such  as  unstructured 
and/or  solution-adaptive  metiiods  can  be  used  to  speed  the  grid  generation  process  and  to  automati¬ 
cally  cluster  mesh  points  in  regions  of  interest  Global  features  of  the  flow  can  be  significaiitly  af¬ 
fected  by  isolated  regions  ofinadequatelyresolvedflow.  Theseregions  rnaynotexhibithighgradi- 
ents andean bedifificultto detect  Thuscxoessivetesolutionincerfainiegionsdoesnotnecessarily 
increase  the  aocuracty  of  tiie  overall  solutiori. 


matelyd^iendsonthefcatutedetectionalgotitiiniusedtodeterininesolutiondomainregioiiswhich 

require  a  fine  mesh  for  tiieir  accurate  representation.  Typically,  weight  functions  are  construt^  to 

rninuc  thelocaltruncalio^  error  andrnay  require  substantialuserinputMostproblerns  of  engineer- 

ing  interest  involve  multi^lock  grids  and  widely  diqiarate  length  scales.  Hence,  it  is  desirable  that 
the  adaptive  grid  feature  detection  algorithm  be  developed  to  recognize  flow  structures  of  different 
type  as  well  as  intensity,  and  adequately  address  scaling  and  normalization  across  blocks. 
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These  weight  functions  can  &en  he  used  to  construct  blending  functions  for  algebraic  redistributi 

interpolation  factions  for  unshliotured  grid  generation,  forcing  fjmctions.  to.  attract/cepelpdints^^ 

an  elliptic  system,  or  to  trigger  local  refinement,  based  upon  applifcation  of  an  equidistribution  Drin- 
ciple.  The  popularity  of  solution-adaptive  techniques  is  growing  in  tandem  with  unstructured  meth” 
ods.  The  difficultly  of  precisely  controlling  mesh  densities  and  orientations  with  current  unstru  * 
hired  grid  generation  systems  has  driven  the  use  of  solution-adaptive  meshing  Use  of  derivaSJ^” 
of  density  orpressurcare  widelyusedforconstructionof  such  weightfunctions,  andhavebeeno  ^ 

en  very  successful  for  inviscid  flows  with  shocks[2,7.1 1].  However,  less  success  has  been  reS^ 
for  flowfields  with  viscous  layers,  vortices  or  shocks  of  disparate  strength.  It  is  difficult  to  maintain 
the  appropriate  mesh  point  spacing  in  the  various  regions  which  require  a  fine  spacing  for  adequate 
resolution.  Mesh  points  often  migrate  from  important  regions  due  to  refinement  of  dominam  fea¬ 
tures.  An  example  of  this  is  the  well  know  tendency  of  adaptive  methods  to  increase  the  resolution 
of  shocks  in  the  flowfield  around  airfoils,  but  in  the  incorrect  location  due  to  inadequate  resolution 
of  the  stagnation  re^on.  This  problem  has  been  the  motivation  for  this  research. 

The  weight  functions  developed  utilize  scaled  derivatives  and  normalizing  procedures  to  miniirii^A 
or  eliminate  the  need  for  user  input.  The  most  attractive  feature  of  this  work  is  the  ability  to  detect 
flow  features  of  varying  intensity  and  the  lack  of  user  defined  inputs  for  the  selection  of  the  weight 
function. 

In  this  research  a  NURBS  representation  is  employed  to  define  block  surfaces  for  boundary  point 
redistributioiL  The  features  described  have  been  implemented  into  Adapt2D/3D.  An  adaptive  grid 

systemcapableofautomaticallyresolvingcomplexflowswithshock  waves,  expansion  waves,  shear 

layers  and  complex  vortex-vortex  and  vortex-surface  interactions.  An  adaptive  grid  approach  • 
seems  wdl  suited  for  such  problems  in  which  flie  spatial  distribution  of  lengfii  scales  is  not  known 
apriori.  •  ■  ••  •'  ;  .  •  .  . .  •  •  : 

APPROACH  TO  ADAPTION 

The  elliptic  generation  system: 

-  0  (1) 

/-I  y-i  t-i 

where  r  :  Positiori  vector, 

gy  rContravaiiant  metric  tensor 
:  Curvilinear  coordinate,  and 
Pk  :  Control  function. 

is  widely  used  for  gnd  generation  [1].  Control  of  the  distribution  and  characteristics  of  a  grid  system 
canbeachievedby  vaiyingthe  values  of  tiiecontrolfunctionsPkinEquation  1  [IJ.  The  application 
of  the  one  dimensional  form  of  Equation  1  combined  wift  equidistribution  of  the  wd^t  function 
results  in  fiie.ddSnifion  of  a  set  of  control  functions  for  fiuee  dimensions. 


a  =  1.2.3) 


These  conbol  functions  were  generalized  by  Rififtman  [2]  as: 

■  _ 


y-i  ^  ' 


(2) 

(3) 
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In  order  to  conserve  some  of  the  geometrical  Characteristics  of  the  original  grid  the  definition  of  the 
control  functions  is  extended 'as:'  .  ’  . 

P,  =  +  cfPJ  6  =  1,2.3)  (4) 


where  P  initial  geometry  •  Control  function  based  on  initial 

geometry 

Pwt  :  Control  function  based  on 

current  solution 

Ci  :  Constant  weight  factor. 

These  control  functions  are  evaluated  based  on  the  current  grid  at  the  adaption  step.  This  can  be 
formulated  as: 

pi..y  ^  a  =  1,2.3)  (5) 

where 

p(i)  ^  p(o)  +  cX/’j"”  a  =  1.2.3)  (6) 

A  flow  solution  is  first  obtained  with  an  initial  grid.  Then  the  control  functions  Pi  are  evaluated  in 
accordance  with  Equations  2  and  5,  which  is  based  on  a  combination  of  the  geomeUy  of  the  current 
grid  and  the  weight  functions  associated  with  the  current  flow  solution[ll]. 

Evaluation  of  the  fording  functions  corresponding  to  the  grid  input  into  the  adaptation  program  has 
proven  to  be  troublesome^  Direct  solution  of  Equation  1  for  the  forcing  functions  using  die  input 
grid  coordinates  via  Oramer*s  rule  or  libraries  was  not  sucoessfuL  For  some  grids  with  very 
hi^  aspect  ratio  cells  and/or  very  rapid  changes  in  cell  size,  tiie  fordng  functions  became  very  large. 
T3iet&e*of  Mydi£feretiidng;sdietoootiler  tiian  thConeUsed  tQevaluatetiiemCtrics;suGh:as  thehybrid 
upwind  scheme[8],  wouldresultinveiylaigem^hpointmovements.  An  alternative  tedmique  for 
evaluating  the  fordng  functions  based  on  derivatives  of  the  metrics  was  implemented[3]. 


Pt 


2  8u  '*'2  gn  ■‘‘2  «SJ 


(«• 


1,2.3) 


Tliis  tedinique  has  proven  to  be  somewhat  more  robust,  but  research  efioris  are  continuing  in  this 
area. 


SOLUTION  PROCEDURE 


The  software  developed  inputs  two  PLOT3D  format  files  [4],  one  for  the  grid  and  one  for  tiie  flow 
solution.  Thesefilescontaindienumberof  blocics,  thegridsize,  tiiegridcoordinatesandtiiesolution 
vector.  Ouqiut  consists  of  an  NPARdS]  restart  file,  as  well  as  two  PLOT3D  files  of  the  adapted 
grid  and  the  flow  solution  intetpolafeddnto  die  new  grid.  Amultirlilock  treatment  capabil^  is  in¬ 
cluded.  Theadaptivegridisoonsfrucledintiireestqis.  Thefiiststqiistogeneratediewrig^^c- 
tions,  which  due  to  their  critical  importance  will  be  discussed  in  detail  in  a  s^arate  section.  The 
second  stq)  is  to  generate  the  actual  adapted  grid  by  equidistribution  of  the  aforementioned  wei^t 
function.  In  the  current.wmfc  tilts  is  aocompltsh^  by  the  numerical  solution  of  Eqpiation  I. .  A 
coupled  three-dimensional  strongly  implicit  procedure  (CSIP)  as  described  by  GhiaetaL  [til  has 
been  implemented  for  the  solution  of  the  discretized  equations.  Upwind  differendng,  witii  biasing 
based  on  the  sign  of  the  forcing  functions,  as  well  as  central  differendng  has  been  implemented  and 
studied  for  the  first  derivative  terins.  The  first  order  upwind  differencing  increases  the  stability  of 


the  procedure,  but  at-  the  expense  of  smearing  the  clustering.  Hence,  a  hybrid  up  wind/ceniral 

"■  '.differencihg^  scheme  has  been  implemented  to  less^  this  smearing  while  maintaining'the  smooth¬ 
ness  and  stability  of  the  upwind  procedure.  Central  differencing  has  been  employed  for  all  second 
and  mixed  derivative  terms.  All  non-linear  tenns  were  treated  by  quasi-lineaiizadoo;  Since  Equa¬ 
tion  1  is  solved  iteratively,  the  forcing  fimctions  must  beevaluatedfor  each  new  successive  grid  point 
location.  The  fordng  functions  are  obtained  by  inieipoladon  from  the  origin^  grid.  Theintetpola- 
tion  procedure  employed  was  that  used  for  the  non-matching  block  to  block  interface  capability  of 
the  NAPRC  [5]  code.  This  scheme  is  based  on  trilinear  mteq)olation.  Upon  convergence  of  this 
process  the  flow  solution  is  interpolated  onto  the  adapted  grid  using  the  same  interpolation  subrou¬ 
tine.  Calculations  can  then  be  continued  from  the  new  restart  file.  This  procedure  can  then  be  re¬ 
peated  until  an  acceptable  solution  is  obtained.  Experience  indicates  that  coupling  this  procedure 
with  a  code  capable  of  treating  time  accurate  grid  movement  would  ease  this  process  and  lessen  the 
CPU  requirements. 

Currently,  grid  adaptation  is  performed  in  an  uncoupled  manner.  For  simple  problems  this  is  ade¬ 
quate  as  only  marginal  changes  are  observed  after  two  to  three  cycles  of  adaptation.  However,  for 
complex  flow  fields  it  appears  necessary  to  closely  couple  the  flow  solver  and  the  adaptation  tech¬ 
nique  as  significant  changes  in  both  grid  and  flow  solution  are  visible  after  even  six  cycles  of  un¬ 
coupled  adaption.  Also,  the  more  complex  flows  require  the  use  of  a  hybrid  differencing  scheme 
for  Ae  solution  of  the  grid  equations .  For  flows  containing  strong  shocks  and  shear  layers  along  with 
weaker  structures,  central  differencing  was  unstable  and  first  order  upwind  differencing  of  the  grid 
equations  smeared  the  weaker  structures.  Therefore  to  obtain  full  benefit  of  the  active  grid 
blended  centraVupwind  has  been  implemented  for  the  grid  equations. 

WEIGHT  FUNCTIONS 

•  y^jplicationof.theequidistribufionlawresults  in gridspadnginversclypropprtio.nal  to.  the  weight 
function,  and  hence,  the  weight  function  detnmines  the  grid  point  ^tribution.  Ideally,  tire  wei^t 
would  be  the  local  truncation  error  ensuring  a  uniform  distribution  of  error.  However,  evaluation 
of  the  higher-order  derivatives  contained  in  the  truncation  error  from  the  available  discrete  data  is 
progressively  less  accurate  as  the  order  mcreases  and  is  subjectto  noise.  Determination  of  this  func¬ 
tion  is  one  of  the  most  challenging  areas  of  adaptive  grid  generation.  Lower-order  derivatives  must 
be  non-zero  in  regions  of  wide  variation  of  higher-order  derivatives,  and  are  proportional  to  the  rate 
of  variation.  Therefore,  lower-order  derivatives  are  often  used  to  construct  a  wdght  function  as  a 
proxy  for  the  truncation  error.  Construction  of  these  weight  functions  often  requires  the  user  to  spec¬ 
ify  which  derivatives  and  in  what  proportion  they  are  to  be  used;  This  can  be  a  time  consuming  pro¬ 
cess.  Also,  due  to  the  disparate  strength  of  flow  features,  important  features  can  be  lost  in  the  noise 
ofdominantfeatures.  The  wdgfrtftmctionsdevelopedby  Soniand  YaqgfT]  andThombuigandSoni 
[8]  are  examples  of  such  efforts.  The  weigfrt  function  ofThombuig  and  Soni  [8]  has  tiie  attractive 
featureofrequiringnouserspecifiedinputs.  Relativederivativesareusedtodetectfeaturesofvaty- 
ing  intensity,  so  that  weaker,  but  important  structures  such  as  vortices  are  accurately  reflected  in  tiie 
wdlgfit  function.  Ih  addition,  each  conservative  flow  variable  is  scaled  indq)eodently.  Onesided 
differences  are  used  at  boundaries,  and  no^«lip  boundaries  require  special  treatment  since  the  veloc¬ 
ity  is  zero.  This.case  is  handled  in  the  same  manner  as  zero  vdodty  regions  in  tiie  field.  A  small 
valuer  q>silon  in  equation  8,  is  added  to  all  normalizing  quantities.  In  the  present  work  tills  weight 
function  has  been  modified  using  the  Boolean  sum  construction  metiiod  of  Soni  [7].  Also,  several 
enhancements  of  an  implementation  nature  have  been  employed.  For  example  epsilon  has  been 
placed  outside  the  absolute  value  operatm'.  This  eliminated  the  possibility  of  spurious  gradients  in 
the  weight  function  in  regions  where  q}silon  was  nearly  equal  and  opposite  in  sign  to  the  local 
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Figure  1,  Comparison  of  Weight  Frmctions.  Figure  2.  Comparison  of  Adapted  Grids. 


It  can  t^e  pbserv<^  that  both  weight  fonctions  cicarly  detected  the  primary  shock.  It  can  al$o  be 
that  the  expansion  fan,  boundary  layer,  arid  the  reflect^  shocks  are  much  morecleily  represe^T^ 
in  the  current  weight  fim^on.  Ad^ted  grids  using  both  weight  function  formulations  are  piesenteH 

in  Fig.2.  The  high  gradient  regions  of  the  expansion  r^on  are  only  rdEIected  in  the  adapted 

using  the  new  weight  fimetioa  The  reflected  shock  is  also  much  sharper.  Kgure  3  compares^^ 
solution  obtained  using  the  current  adaption  procedure  with  that  obtained  using  the  oriebal  ^ 
The  enhanced  resolution  is  tdeatly  evident  Stid. 
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Figure  3.  Comparison  of  Solutions  Using  Adapted  Grid. 

BOU^®AR¥;PO^^n^RE0IST«IBUTION^  -  .... 

Accurate  representation  of  the  flowfteld  in  the  viciniQr  of  boundaries  is  critical  for  an  acceptable 
overall  solutioa  Physics  occurring  hear  boundaries  often  drive  the  flow  physics  occunirtg  in  other 
regions  of  the  flowfi^d.  This  is  espedaUy  true  for  noslip  surfaces.  H^ce,  die  quality  and  distribu¬ 
tion  of  the  grid  in  ^  region  is  of  critical  importance.  For  the  implementation  of  many  turbulence 
models  or&ogonality.is  also  required.  When  using  an  adaptive  procedure  based  on  a  redistribution 
of  mesh  points,  such  as  in  this  work,  the  interior  points  move  as  the  grid  is  nrfapted.  This  leads  to 
distorted  cells  if  die  boundary  joints  are  not  redistributed  as  the  grid  is  adapted.  Both  grid  quality 
Md  g^metric  fidelity  must  be  in^tained  during  redistribution.  In  the  current  work  all  surfaces  of 
individual  blocks  are  treated  in-die  same  manner,  whether  block  interfaces  or  flow  boundary  condi¬ 
tions.  IWost^  must  be  performed.  Histfhegeomettyisdefined,andsecondlydiesurfacemesh 
is  regenerated  using  agiven  distribution  mesh.  The  geometry  is  defined  as  a  MURB  surface  from 
the  concent  surface  mesh  to  be  redistnbuted.  This  prooedoie  is  based  on  the  algoritiims  developed 
by  Yu  and  Som  Genie++£9].  This  definition  is  flien  used  to  generate  die  surfeoe  uaing  a  user 
specified  distribution  mesh.  Theentiresurfiiceorasubrcgioncmberedistributed.  Subregions  can 
be  used  to  fix  pomts.  such  as  sharp  comets  or  a  ftansifion  point  between  boundary  oon*titi<^n  type. 
Fbr  example  block  boundary  tof  noslip  surface.  The  distribution  mesh  is  a  [0,1]  square  evaluated 
firotnaqiedfiM  surface  based  upon  arc  length.  For  solid  surfaces  the  distribution  mesh  is  based 
on  the  near^interitx  coordinate  surfaces.  The  spadng  between  surfaces  is  small  and  the  surfiu^ 
are  of  a  similar  geometric  shape.  Therefore  the  normal  coordinate  is  neatly  orfliogonal  to  the  surface. 
Block  interfaces  are  treated  by  redistributing  the  current  block  surface  based  on  its  corresponding 
surface  in  die  ndlghboring  block. 


FLOW  AROUND  GENERIC  MISSILE  CONFIGURATION 

Supersonic  flow  at  Mach=i.45  arid  14  de^ee  angle  of  attack  has  be^  simulated  around  af«n 
ogive  cylmder.  The  grid  constructed  after  two  adaption  cycles  using  hybrid  differencina  of 
equations  and  (he  current  weight  functions  is  presented  is  presented  in  Figure  4.  ^ 
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Figure  4.  Adapted  grid  after  two  cycles. 


Figure  5.  Adapted  grid  after  two  cycles. 


Figure  5  presents  the  grid  constructed  using  the  previous  weight  function  and  the  same  flow  condi¬ 
tions  and  number  of  adaptation  cycles.  Figures  5  and  6  present  streamwise  cuts  of  the  two  erids 

shownmFigs4and5atX;^=5^and7^respectivcly.ThelefthadsideofFigures6and7<Se- 

Spend  to  Figure  5  and  the  rigjit  hand  side  corresponds  to  Figure  5. 


Figured.  X/Da:55 


Figure7.XyDss8J. 


As  can  be  seen  from  Figures  4-7  the  feeding  sheet,  and  both  primaiy  and  secondary  vortices,  as  well 
as  the  strong  shock  at  the  nose  are  cleariy  visible.  Comparison  of  Figures  4  and  5  show  flie  sharper 
resolution  provided  by  new  fonnulation  of  the  wdlght  function.  The  shock  itsdfis  more  sharply 
r^resented,  biit  the  greatest  improvement  is  in  representing  the  vortex  and  the  shear  layer.  The  pre¬ 
vious  foraulation  simply  clustered  inesh  points  in  the  vicinity  of  the  vortex.  This  does  improvereso- 

lution.  However,  the  new  formulation  clearly  shows  the  circular  shape  of  the  vortex  asweUasthe 
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high  gradient  regioi^  on  the  edges  of  the  structure.  Hence,  the  new  method  of  grid  adaptation  m 
closely'reproduces  die  flow  physics.  Also,  stronger  and  shaq}er  concentration  of  mesh  pointT^^ 
observed  with  the  new  fonnulation.  This  is  most  apparent  in  the  far  field  re^on  and  the  greater  detaU 
of  the  flowfield  visible  in  die  grid.  Both  theprimaiy  and  secondary  separation  points  are  well  repre 
sented.  On  the  windward  side  of  the  missile  it  can  be  seen  that  the  boundary  layer  clustering  is 
ly  increased  by  the  adaptation  procedure.  This  is  desirable  in  this  area  as  the  boundary  layer  is  the 
only  feature  of  interest  Scamination  of  Rg  4  reveals  that  toward  the  aft  end  of  the  missile,  the  off 
surface  structure  is  not  well  defined.  This  is  because  after  two  <ycles  of  adaptation  the  structure  is 
not  well  enough  defined  by  die  flow  solver  and  associated  grid  for  the  adaptation  procedure  to  sharo- 
ly  define  the  structure.  Only  structures  that  have  been  at  least  partially  resolved  by  the  flow  sol^r 
can  be  detected  by  the  weight  function.  Hence  the  quality  of  the  weight  function  is  dependent  upon 
the  quality  of  the  solution.  It  should  be  noted  that  resolution  of  the  flowfield  improves  with  each 
cycle  of  adaptation.  The  adaption  procedure  and  the  flow  solver  should  be  coupled  so  that  the 
adapted  grid  can  reflect  all  the  features  that  are  detected  as  the  solution  progresses  and  improves  due 
to  adaption. 

Figure  8  presents  the  flow  solution  obtained  using  the  NPARC  [5]  flow  solver,  and  the  KE  turbu¬ 
lence  model  option.  Figure  9  presents  the  associated  weight  function. 


CONCLUSIONS 

The  results  shown  demonstrate  the  capability  of  tiie  developed  weight  function  to  detect  shocks  of 
differing  strengths,  piimaiy  and  secondaiy  vortices,  and  shear  layers  adequately.  For  tiie  test  case 
presented  the  increased  resolution  of  the  shock  and  the  expansion  region  resulted  in  the  structure 
increasinginsizesotiiatinferactionwitiithefarfieldoocuired.  Thus,tiiefarfieldboundariesshould 
be  placed  at  a  larger  distance  from  the  body.  No  user  input  is  required.  It  is  imperative  tiiat  the 
adaptation  process  be  coupled  with  the  flow  solver  as  e3q)erience  indicates  tiiat  the  complex  flow- 
fields  may  require  many  adaptive  cycles.  This  is  particularly  critical  for  flow  fidds  that  are  not  well 
r^reseni^  by  the  initial  solution.  The  adapted  grids  allow  the  use  of  larger  timft  steps  to  increase 
the  convergence  rate.  However,  the  single  greatest  benefit  resultirtg  from  the  adapted  grid  is  the  low¬ 
ering  of  the  artificial  viscosity  required  for  stability.  The  adapted  grid  aligns  and  dusters  near  shocks 
and  shear  layers.  For  the  test  problem  a  dramatic  decrease  in  the  artificial  viscosity  coefficients  is 
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possible  when  running  on  the  adapted  grid.  This  results  in  off  surface  structures  which  are  mu  h 
sharper  and  more  closely  resemble  the  flow  visualization.  The  boundary  layer  on  the  windward  sid 
also  became  thinner  and  the  normal  grid  spacing  was  decreased  by  the  adaptive  procedure  ^ 

Currently  multi-block  problems  are  being  simulated  to  evaluate  this  capability.  Attention  is  be’ 
placed  upon  across  block  scaling  for  the  weight  iimctions.  A  global  maximum  across  all  blo^ 
seems  to  work  well  for  the  normalization  of  weight  function  components.  However,  this  issue  is  be 
ing  monitored  as  more  complex  problems  are  attempted.  A  further  problem  arises  due  to  equidi  ' 
tribution  via  forcing  functions.  Tto  does  not  appear  to  be  a  problem  where  a  block  face  is  cowect^ 
to  one  and  only  one  face  of  another  block.  Problems  have  been  encountered  for  a  multi-block  i, 

vehicle  computation  where  a  block  face  consisted  of  a  block  to  block  connection  and  a  solid  surfa^ 
for  the  base  region.  A  smooth,  continuous  set  of  forcing  functions  across  this  block  boundary  does 
not  yield  a  smooth  grid.  It  is  believed  that  this  is  due  to  the  elliptic  character  of  the  grid  equations 
The  simple  fix  is  to  introduce  artificial  block  boundaries  to  force  one  to  one  conrespondence.  Work 
is  continuing  on  these  issues  as  well  as  coupling  with  a  flow  solver,  unsteady  flows,  and  more  com¬ 
plex  multi-block  problems.  These  results  will  be  published  at  a  later  date. 
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donJJ {PMAGToMbSlt  three  dimensional  multiblock 

cessors,  MP I  is  used  for  message  passing  On  ^  sses  that  maybe  distributed  over  multiple  pro 

threads.  T^iridsareredistrlutedaslsZutiffofti^'St^aSdT'^^^^^ 

IS  computed  as  a  boolean  sum  of  the  scaled  eradJ^tln«f^  I  equations.  Weight funcLn 

condi^has  been  implemented  using  NURBS  to  tnaintaineZmeti^  variables.  Neumann  boundary 

^•^J’tgmverseNURBSformulationAparallelSZSX^^Jllf’^'^f'^^^ 


1.  Introduction 

Grid  generation  is  the  starting  point  of  anv 
computational  fluid  dynamics  simulatiSi.  CmaS 

®  multiblock  do- 

^hasa^waysb^noneofthe  mosttime  consum¬ 
ing  a^ts  m  flow  simulation.  Grids  have  to 

geon^etiy  under  consider¬ 
ation.  Grid  pomts  have  to  be  dense  enough  to  re¬ 
solve  the  u^rtant  flow  features  since  the  tcuiacy 
of  anumencal  solution  depends  heavily  on  the  grid 
^  sufficient  knowledge  of  &e 

ow  physics  is  not  known  a-priori  to  increase  grid 
unpoitant  regions.  SoluSn 
ba^  ^d  adaptation  involves  distributing  the  grid 
man  optimimfashion  to  give  higherres^- 
rion  of  impor^t  flow  features.  There  are  three  dis- 

^tribution, 

r^einent/d«efinement,  and  local  increase  of  the 
order  of^  numencal  algorithm.  Grid  redistribu- 
^gebraicafly  or  by  solving  partial 
^erential  equ^ons.  Algebraic  methods  are  fast 
but  ^1^  to  Ughly  skewed  grids  in  complex  do- 

equations  on 

&e  other  hand  are  much  slowerbut  give  smooth  and 
close  to  orthogonal  grids.  The  control  fimctions 


give  a  faiity  accurate  control  over  grid  point  dis¬ 
tribution.  Tius  makes  it  highly  desiSle  to  use  el- 
^tic  equations  to  generate  solution-adaptive 

j  1  ^  ^  ^  stand-alone  grid  adaption 

m^ide,  the  flow  solver  is  stopped  aftSafewitera- 

ttons  to  ac^t  the  gnd  to  the  developing  solution. 
A  few  such  Iterations  result  in  a  completely  con- 
verg^solution.  This  requires  that  the  grid  adaption 
algonthms  be  very  fast.  Further  if  thiflowrolver 

fSLv?  for  grid  speeds,  then  the  adaption 

algonthm  should  mterpolate  the  solution  from  the 
earlier  gnd  onto  the  new  grid  before  the  flow  solver 

IS  restarted  TheParaUelMultiblockAdapti>r^J 

generator  ^MAG)  presented  in  this  paper,  was 
Aiw  conceived  with  the  motivation  of  creating  a 
fast  grid  ad^tion  algorithm  that  could  handle  any 
gcncrnl  multiblock  topology. 

2.  Goals 

™  Ki  ^  absolutely  no  restriction 

on  block  connectivity.  A  block  can  be  be  con- 

n^  to  any  block  including  itself,  thus  suppoit- 

to^fo^M^  complex  three  dimensional 


The  algorithm  should  adapt  a  multiblock 
grid  concurrently  with  each  block  solved  in  an  in¬ 
dividual  process.  These  processes  could  be  run  on 
a  shared  memory  parallel  machine  or  distributed 
over  a  network  of  workstations.  The  algorithm 
should  be  scalable.  Communication  latency 
should  be  minimum. 

Grid  adaption  should  require  minimum 
user  interaction.  It  should  resolve  all  important 
flow  features.  The  Neumann  boundary  conditions 
should  be  incorporated  to  maintain  grid  quality 
near  body  surfaces.  Solution  interpolation  must  be 
included. 


3.  Grid  Adaption 

3.1  The  elliptic  equations 


PMAG  generates  the  adapted  grids  as  a 
solution  to  the  elliptic  partial  differential  equa¬ 
tions.  Elliptic  equations  can  be  written  concisely 
as: 

i-iy-i  *-i 


The  grid  point  distribution  can  be  con¬ 
trolled  by  varying  the  control  functions  ap¬ 
propriately.  It  is  necessary  to  maintain  the  original 
grid  characteristics  during  grid  adaption.  Geomet¬ 
ric  control  functions  are  evaluated  from  the  existing 
grids  using  the  formulation  indicated  by  Soni  [1]. 

o*  _  f'>\ 

2^  gn  g22  ^  ^ 


3.2  Adaptive  weight  functions 

Adaptive  weight  functions  are  computed 
using  the  spring  analogy  form,  where  the  weights 
are  a  function  of  the  computational  coordinate.  The 
weight  function  investigated  by  Thornburg  and 
Soni  [3]  has  proved  to  detect  flow  features  of  vary¬ 
ing  strengths,  including  vortices,  boundary  layers 
and  shocks  of  disparate  strengths  and  requires  al¬ 
most  no  user  input  This  feature  is  certainly  very  at¬ 
tractive  for  a  general  multiblock  grid  adaptation 
algorithm.  Relative  derivatives  ate  used  so  that 
weaker  but  important  features  such  as  vortices  are 
also  reflected  accurately  in  the  weight  functions. 

Each  conservative  flow  variable  is  scaled 
independently.  A  small  value  epsilon  is  added  to  all 
normalizing  derivatives,  primarily  to  handle  zero 
velocity  regions.  This  epsilon  value  also  works  as 
a  filter  for  minor  disturbances.  A  boolean  sum  of 
the  scaled  derivatives  is  used  to  compute  the  weight 


fimctions  in  each  direction  which  are  then  normal¬ 
ized.  Weight  functions  in  each  direction  are  then 
added  with  another  boolean  sum  to  give  the  final 
weight  function. 

U7  —  _ W* _  ^  _ 
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The  control  functions  are  computed  from 
the  weight  function  as 


P* 

^adap  ^ 


(5) 


The  final  control  functions  are  obtained  as 
a  weighted  sum  of  the  geometric  and  adaptive  con¬ 
trol  functions. 

^  ~  ^gtomP geom  "h  ^adapPaiap  ( 6 ) 


3.3  Neumann  boundary  condition 

Boundary  point  movement  without  loss  of 
geometric  details  can  be  accomplished  by  using 
NURBS  definition  of  the  body  surfaces.  NURBS 
definitions  of  the  original  body  surfaces  are  created 
using  inverse  NURBS  formulation.  Boundary 
point  movement  is  achieved  by  reparameterization 
of  the  surface  to  preserve  slope  continuity  or  ortho¬ 
gonality  at  the  boundaries  as  required. 

Boundary  layers  typically  show  dense  grid 
packing.  It  is  quite  practical  to  assume  that  the  grid 
plane  next  to  the  boundary  would  follow  the  con¬ 
tour  of  the  boundary  fairly  accurately.  This  means, 
orthogonality  of  grid  lines  can  be  easily  achieved, 
if  the  boundary  has  a  point  distribution  identical  to 
that  on  its  neighbouring  plane.  Thus,  the  algorithm 
first  computes  NURBS  definition  of  the  grid  plane 
next  to  the  boundary  surface.  The  distribution 
space  of  this  plane  is  then  used  to  remesh  the 
boundary  NURBS  surface  thus  getting  almost 
identical  point  distribution  on  those  surfaces. 
PMAG  can  also  give  slope  continuity  instead  of 
orthogonality  if  so  desired  by  using  a  similar  strate¬ 
gy  [2]. 


3.4  Solution  interpolation 

The  fast  search  and  interpolation  algorithm 
used  in  this  code  takes  advantage  of  the  local  physi¬ 
cal  properties  by  mapping  the  global  geometry  in 
terms  of  local  coordinates  a,  p,  y  of  the  current  cell 
[5].  The  interpolation  point  is  expressed  in  terms 
of  these  local  coordinates  which  fall  within  a 
known  interval  (-1,  1)  if  Xp  lies  within  the  cell.  If 
any  of  the  coordinates  fall  outside  this  interval,  then 
the  next  cell  is  guessed  based  on  the  value  of  those 
local  coordinates.  The  next  guess  cell  can  be  diago¬ 
nal  to  the  current  cell  if  more  than  one  coordinates 
falls  outside  the  interval.  Further  if  any  of  the  coor¬ 
dinate  is  extremely  large  ( say  a  »  2  )  then  the  al¬ 
gorithm  can  intelligently  jump  more  than  one  cell 
in  the  appropriate  direction.  This  method  signifi¬ 
cantly  sp^s  up  the  search  process. 

4.  Parallel  Implementation 

Each  block  has  to  be  solved  in  an  indepen¬ 
dent  processes.  PMAG  spawns  processes  equal  to 
the  number  of  blocks  in  the  topology.  Each  block 
is  expected  to  be  stored  in  a  separate  disk  file.  This 
is  done  to  enable  concurrent  reading  and  writing  of 
grid  files  by  each  process.  The  user  needs  to  supply 
the  cormectivity  information  for  each  block.  This 
includes  information  about  shared  faces  and  fixed 
patches. 

Grid  lines  must  be  continuous  across  ad¬ 
joining  blocks.  The  inter-block  faces,  edges  and 
vertices  which  do  not  describe  a  definite  fixed  body 
must  be  free  to  float  in  the  space.  This  requires  that 
the  block  faces  be  solved  with  an  exchange  of  in¬ 
formation  across  the  block  faces.  Each  block  has  a 
layer  of  ghost  cells  that  contain  data  from  the  neigh¬ 
boring  blocks.  This  data  is  updated  at  intermediate 
intervals  using  asynchronous  conununication. 

Each  processor  computes  the  control  func¬ 
tions  and  runs  the  elliptic  solver.  A  global  norm  is 
computed  after  every  iteration  to  determine  the 
convergence  criteria.  Solution  interpolation  and 
boundary  point  movement  using  NURBS  is  done 
after  every  n  iterations  as  specified  by  the  user. 

Tliis  algorithm  achieves  scalability  by  the 
use  of  threads.  If  excess  processors  are  available, 
the  processes  subdivide  their  domain  by  uiurolling 
the  outmost  loop  of  the  solver  and  the  search  algo¬ 
rithm.  Threads  ate  spawned  to  work  on  each  subdo¬ 
main.  Use  of  threads  is  advantageous  in  mote  than 
one  ways.  Scalability  is  the  primary  advantage. 
Maximum  use  of  available  resources  is  harnessed 
by  splitting  blocks  into  threads.  Controlling  num¬ 


ber  of  threads  spawned  by  each  process  aids  in  load 
balancing.  The  larger  blocks  can  spawn  more 
threads  than  the  smaller  blocks.  Splitting  the  do¬ 
main  into  smaller  chunks  leads  to  better  cache  per¬ 
formance 

Threads  are  used  with  MPI  although  the 
MPICH  implementation  is  not  thread  safe  [7]. 
Therefore  only  one  thread  is  active  at  the  time  of  in¬ 
ter-process  communication.  The  rest  of  the  threads 
stop  during  this  time.  Although,  it  is  not  really  es¬ 
sential  to  stop  the  other  threads.  A  further  enhance¬ 
ment  to  the  current  version  would  have  other 
threads  continuing  the  computations  with  locks  on 
the  data  while  only  one  thread  is  dedicated  to  com¬ 
munications. 

4.1  Handling  shared  faces 

To  guarantee  complete  continuity  of  grid 
lines  across  block  faces,  each  block  sends  a  face  to 
its  neighbouring  block  as  shown  in  Figure  1  .  The 
elliptic  generator  can  then  run  using  the  face  from 
the  neighbourmg  block  as  the  dirichlet  boundary. 
After  every  iteration  the  new  updated  faces  are 
transmitted  to  the  neighboring  block.  This  way ,  the 
face  points  are  solved  using  the  elliptic  equations 
at  every  step.  However,  each  block  solves  for  the 
face  points  separately.  Hence  it  is  possible  that  the 
face  points  may  not  coincide  after  an  iteration. 

To  get  rid  of  this  slight  discontinuity,  some 
sequential  multiblock  codes  keep  a  track  effaces, 
edges  and  vertices  belonging  to  each  block. 
Connectivity  tables  are  maintained  to  identify 
shared  vertices  edges  and  faces.  At  the  end  of  every 
iteration,  these  connectivity  tables  are  used  to  pick 
out  the  copies  of  the  common  face,edge  or  vertex 
and  then  an  averaged  (or  elliptically  solved  )  value 
is  broadcast  to  all  the  owners.  Another  method  es- 


Figure  1  Exchange  of  information. 


pecially  amenable  in  C  codes  uses  pointers  to  faces, 
so  that  only  a  single  copy  of  a  shared  face  is 
maintained,  with  both  blocks  pointing  to  the  same 
memory  location. 

Clearly  the  later  strategy  cannot  be  used 
when  the  blocks  are  being  solved  on  different 
processors.  The  first  strategy  of  collecting  all 
copies  and  then  broadcasting  an  average  can  work 
in  a  parallel  implementation  but  it  means  a  lot  of 
commimications.  (8  vertices  12  edges  and  6  faces 
per  block).  Assume  a  simple  structured  topology. 
Each  face  can  be  shared  by  only  one  other  block. 
One  edge  is  shared  by  4  blocks  and  a  vertex  is 
shared  by  8  blocks.  In  such  a  case,  for  getting 
unique  face, edge  and  vertex  locations  each  block 
needs  to  communicate  with  26  neighbouring 
blocks  even  in  the  simplest  topology! 

It  was  argued  that  since  an  elliptic  system 
has  a  3  point  stencil,  and  since  both  the  blocks  have 
identic^  copies  of  all  the  three  faces  required  for 
computation  of  the  block  boundary  face,  the  face 
points  calculated  by  the  each  block  using  the 
elliptic  system  should  be  the  same.  Hence,  the  face 
points  are  solved  individually  using  Point  Jacobi  It¬ 
eration  (to  guarantee  a  three  point  stencil).  The 
block  interior  on  the  other  haiid  is  solved  using  the 
tridiagonal  system.  The  control  functions  for  the 
points  on  the  shared  face  are  also  evaluated  using 
only  a  3  point  stencil  guaranteeing  that  neighboring 
blocks  compute  exactly  identical  locations  of  the 
shared  points. 

Simplicity  of  the  entire  scheme  is  a  major 
advantage.  No  complicated  global  connectivity 
tables  need  to  be  maintained.  This  makes  the  code 
extremely  flexible,  enabling  it  to  handle  a  wide  va¬ 
riety  of  block  topologies,  including  0  grids  em¬ 
bedded  in  H  grids,  Perio^c  boundary  conditions 
etc. 

For  the  above  strategy  to  work,  the  basic 
premise  is  that  all  blocks  sharing  a  particular  point 
must  have  identical  copies  of  its  complete  stencil. 
The  user  input  specifies  only  the  blocks  that  share 
a  face  with  the  current  block.  This  means  that  a 
block  has  no  information  about  its  diagonally  op¬ 
posite  neighbor.  A  communication  strategy  that  al¬ 
lows  each  block  to  acquire  information  from  its 
diagonal  neighbors  is  described  in  the  next  section. 

4.2  Commimication  of  shared  faces 

Each  block  knows  only  the  neighboring 
blocks  sharing  a  face  with  it.  This  means  the  block 
cannot  directly  get  the  comer  points  from  its  diago¬ 


nally  opposite  block.  To  overcome  this  problem, 
the  blocks  communicate  faces  inclusive  of  the  extra 
points  received  from  the  neighboring  blocks  ( 
shown  by  the  dashed-blocks).  This  however  means 
that  the  block/blocks  that  send  the  points  before  re¬ 
ceiving  the  faces  from  the  neighbors  would  send 
out  void/old  points  to  the  neighbors.  One  way  to 
overcome  this  problem  is  to  perform  cotmnunica- 
tion  in  a  cyclic  loop.  However  this  means  the  entire 
communication  will  be  sequential  and  result  in 
large  conununication  latency.  The  problem  can  be 
overcome  by  simply  doing  the  communication 
twice.  This  strategy  idlows  us  to  use  asynchronous 
communication,  so  that  more  than  just  2  blocks 
communicate  at  any  instant.  In  case  of  a  simple 
topology  discussed  in  the  earlier  section  each  block 
would  now  communicate  only  12  times.  Note  that 
doing  the  entire  communication  twice  does  involve 
transmission  of  some  redundant  information.  This 
could  be  avoided  by  sending  only  the  edges  and 
vertices  in  the  second  conununication.  This  feature 
has  yet  to  be  implemented  in  PMAG. 

43  Parallel  multiblock  interpolation 

As  the  points  are  moved  the  original  solu¬ 
tion  needs  to  be  interpolated  Onto  the  new  grid.  A 
parallel  gird  adq>tation  algorithm  supporting  gen¬ 
eral  multiblock  topologies  makes  solution  inter¬ 
polation  significantly  more  complex.  The  grid 
points  can  move  outside  the  original  domain  of  a 
block.  In  such  a  case,  the  block  does  not  have 
enough  information  to  interpolate  the  solution  and 
adaptive  functions  to  all  its  points.  Each  block  now 
needs  to  query  all  other  blocks  for  the  points  that  are 
no  longer  within  its  own  domain. 

The  search  algorithm  starts  with  a  search 
and  interpolation  of  all  points  found  within  each 
block.  Each  process  creates  a  list  of  points  that  are 
not  found  within  its  domain.  These  lists  are  con¬ 
catenated  into  a  global  list  which  is  broadcast  to  all 
processes  for  search.  The  processes  then  search  and 
find  the  interpolated  solutions  for  the  points  found 
within  their  domains.  A  final  all-neduce  over  all 
processes  makes  the  solutions  known  to  all  the  pro¬ 
cesses.  This  operation  takes  4’'‘log  P  conununica- 
tions.  A  Shift  operation  ( the  list  of  external  points 
are  shifted  in  a  circle  through  all  processes)  would 
take  P  communications  to  complete.  Hence  a  sUft 
would  be  faster  for  number  of  blocks  <  16.  Ideally 
a  polyalgorithm  should  be  used  to  switch  methods 
according  to  the  number  of  processes.  The  shift  has 
not  yet  been  implemented  in  PMAG. 


4.4  Results  and  discussion 

PMAG  proved  very  successful  in  creating 
adapted  grids  for  the  ARL-Missile  cases  tested  as 
a  part  of  the  KTA-12  program  [8].  Table  1  shows 
performance  of  PMAG  for  the  ARL-Missile  case 
split  into  1, 2  or  6  blocks.  Table  2  shows  the  abso¬ 
lute  efficiency  for  6  blocks  run  on  varying  number 
of  processors  (compared  to  runtime  for  a  single 
block  on  a  single  processor).  Processes  time  share 
when  available  processors  are  less  than  number  of 
blocks.  Threads  are  spawned  when  excess  proces¬ 
sors  are  available  ( 9  and  11 ).  The  efficiency  drops 
for  the  last  two  cases  since  NURBS  and  solution  in¬ 
terpolation  do  not  use  the  threads  as  yet.  All  tests 
were  done  on  a  12  processor  SGI  Onyx  (CPU: 
150MHz  R4400  with  1.5  GB  RAM) .  NPARC  was 
used  as  the  flow  solver  [4]. 

Table  1  Runtimes  for  PMAG  (mm:ss) 

# Blocks  #Procs  RunTime  Speedup 

1  i  58:15  i 

2  2  32:33  Ts 

6  6  '  m4  44 


Table  2  Absolute  Speedup  and  Efficency 
for  6  blocks 


Procs 

RunTime 

Speedup 

Efficiency 

1 

1:12:20 

0.8053 

1.2418 

2 

40:20 

1.4442 

0.7221 

3 

26:30 

2.1981 

0.7326 

6 

13:14 

4.4017 

0.7336 

9 

11:28 

5.0799 

0.5644 

11 

10:07 

5.7570 

0.5227 

Figuie2  Vortex  at  x=8.4 


Figure  3  6-Block  Grid  around  the  Missile 

Figure  4  shows  4  block  2-D  geometry  of 
a  convergent-divergent  nozzle  for  the  Utan  rocket. 
PMAG  was  used  by  stacking  grids  in  the  third  di¬ 
mension  to  emulate  a  3-D  geometry. 


Figure  4  Nozzle  solution  with  adapted  grid 

Figure  5  demonstrates  PMAG’s  capability 
to  generate  smooth  elliptic  grids  around  compli¬ 
cated  topologies.  The  grid  is  a  single  block  0-grid 


Figure  5  3D  geometry  with  grid 

with  periodic  boundary  condition  at  the  kmax- 
kmin  face.  The  original  grid  was  generated  with  a 


marching  algorithm.  However  that  results  in  highly 
skewed  grids  at  the  comer  shown  in  Figure  6. 
PMAG  was  used  to  smooth  the  grid.  Most  elliptic 
solvers  give  a  crossover  at  such  critical  regions  and 
lose  the  grid  point  spacing  near  the  boundary. 


Figure  6  Zoomed  view  of  critical  section 

PMAG  can  also  be  used  as  a  simple  elliptic 
grid  generator.  Figure  4  shows  a  complex  5  block 
grid  smoothened  using  PMAG’s  elliptic  solver. 
The  runtime  is  compared  with  a  serial  multiblock 
grid  generator  GUMB  which  is  aderivative  of  NGP 
[9].  The  speedup  shows  an  absolute  efficiency  of 
0.96  (Table  3). 


Rgure?  5-Block  grid  around  Missile  Fin. 


Table  3  Runtime  for  GUMB  v/s  PMAG 


Algorithm 

#procs 

RunTime 

GUMB 

1 

2:36:16  hours 

PMAG 

7 

23:55  mins 
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verse  NURBS  formulation.  A  parallel  multiblock  solution  interpolation  algo¬ 
rithm  has  been  incorporated  to  guarantee  accurate  adaptation.  The  algorithm 
can  also  be  used  simply  as  a  multiblock  elliptic  grid  generator. 

Several  test  cases  with  a  variety  of  flow  characteristics  are  presented  to 
demonstrate  tiie  capabilities  of  the  algorithm.  Comparisons  made  with  exist¬ 
ing  algorithms  show  distinct  advantages  gained  by  using  this  approach. 


DEDICATION 
Td  my  parents 


u 


ACKNOWLEDGEMENTS 


I  would  like  to  express  my  deep  gratitude  to  my  major  advisor  Dr.  Bha¬ 
rat  K,  Soni  for  his  kind  guidance  and  support.  His  constant  encouragement 
and  appreciation  of  individual  effort  has  been  a  major  driving  force  throught 
this  project.  I  would  also  like  to  thank  my  committee  members  Dr.  Anthony 
SkjeUum  and  Dr.  Joe  Thompson  for  their  invaluable  advice  and  guidance. 

I  would  like  to  thank  my  friends  and  colleagues  at  the  Engineering  Re¬ 
search  Center,  whose  support  has  been  an  invaluable  eisset.  A  special  thanks 
to  Dr.  Roy  Koomullil,  Dr.  Hugh  Thornburg,  Dr.  Tzu  "Yi  Yu  and  Puri  Bangalore 
for  their  help  and  guidance. 

I  am  grateful  to  the  Engineering  Research  Center  (ERC)  and  the  Army 
Research  Labs  (ARL)  for  funding  this  project. 

The  greatest  acknowledgement  goes  to  my  parents  and  my  family  for 
their  constant  encouragement  and  understanding  through  all  these  years  of 
my  life. 


m 


TABLE  OF  CONTENTS 


Page 

DEDICATION. .  ii 

ACKNOWLEDGEMENTS .  iii 

LIST  OF  TABLES  .  vi 

LIST  OF  FIGURES .  vii 

CHAPTER 

L  INTRODUCTION .  1 

Background . 1 

Overview  of  Existing  work  . 3 

Motivation .  5 

Overview  of  the  Present  Work  .  7 

Organization .  8 

n.  THEORITICAL  DEVELOPMENT  FOR  GRID  ADAPTATION .  10 

Introduction  . 10 

The  Elliptic  System  of  Equations  .  10 

Greometric  Control  Functions  .  11 

Solution  Based  Adaptive  Control  Functions . 13 

Boundary  point  movement  using  NURBS  patches .  18 

Background  on  NURBS .  19 

Implementation  ofNeumsinn  Boundary  Condition .  20 

Search  and  Interpolation .  22 

m.  PARALLEL  MULTIBLOCK  GRID  ADAPTATION .  29 

Background  on  Parallel  Computing .  29 

Design  Methodology. .  30 

Performeince  Metrics .  31 

iv 


CHAPTER 


Page 


Parallel  Implementation .  33 

Treatment  of  Shared  Faces .  36 

Block— block  communication  of  shared  faces .  40 

Parallel  Multiblock  Solution  Interpolation . 42 

IV.  RESULTS .  45 

The  ARL  Missile  test  cases .  45 

Parallel  Performance .  47 

Grid  Adaptation  Results .  48 

5-Block  grid  around  a  Missile  Fin .  53 

General  3-D  geometry .  65 

Bronchial  Tubes .  58 

Titan  Nozzle  .  59 

Conclusions .  60 

Future  Work . .  61 

APPENDIX 

A.  TRANSFORMATIONS  AND  TRIDIAGONAL  FORMULATION  ...  63 

Transformation  Relations  .  64 

Formulation  of  the  Tridiagonal  System .  66 

B.  USERS  GUIDE .  68 

Introduction  . .  69 

Overview  and  Terminology  .  69 

TTie  Basic  Steps  .  70 

The  Include  File  “PARAM.INC” .  70 

The  Input  files  .  71 

Local  input  files:  info.## . 71 

Global  input  file:  info.global  .  73 

The  process  group  file:  PMAG.pg .  75 

REFERENCES .  77 


V 


LIST  OF  TABLES 


TABLE  Page 

4.1  Comparison  of  runtime  for  PMAG  with  Hugh’s  Algorithm .  46 

4.2  Absolute  Speedup  and  EfiBcency  for  6  blocks  . . .  47 

4.3  Runtime  over  a  network  of  workstations .  48 

4.4  Forces  and  Moments  Comparison  for  Case  3  .  49 

4.5  Comparison  of  runtime  for  GUMB  v/s  PMAG  .  53 


VI 


LIST  OF  FIGURES 


FIGURE  Page 

2.1  Comparison  of  weights .  16 

2.2  Solution  based  weight  ftinction  .  17 

2.3  Effect  of  Epsilon .  17 

2.4  Loss  of  Geometric  information .  18 

2.5  Distribution  Space  for  NURBS .  21 

2.6  Effect  of  Solution  Interpolation  .  23 

2.7  Mastin’s  Interpolation  Scheme  I  .  24 

2.8  Mastins  Interpolation  Scheme  II .  24 

2.9  Problem  with  direct  solution .  26 

2.10  Multiblock  Solution  Interpolation . 27 

3.1  Splitting  a  block  using  threads .  35 

3.2  Sample  Block  Information  File .  36 

3.3  Flowchart . 37 

3.4  Problem  in  conventional  face  exchange .  41 

3.5  Strategy  to  get  comer  data .  41 

3.6  Algorithm  for  Parallel  Multiblock  Search .  44 

4.1  Comparison  of  adaptation  at  shock .  46 

vii 


FIGURE  Page 

4.2  Graph  of  Speedup  .  48 

4.3  6-Block  Grid  around  the  Missile .  49 

4.4  Vortex  at  x=8.4 .  50 

4.5  Shock  from  front  view .  50 

4.6  Improvement  in  shock  resolution  .  51 

4.7  Close  up  of  improved  shock .  52 

4.8  Smoothened  grid  around  the  fin  .  53 

4.9  5-Block  grid  around  Missile  Fin  .  54 

4.10  The  O  type  grid  .  55 

4.11  3D  geometry  with  grid . 56 

4.12  Side  view  of  entire  geometry .  57 

4.13  Zoomed  view  of  critical  section .  57 

4.14  Bronchial  Tubes .  58 

4.15  Nozzle  solution  with  adapted  grid .  59 


CHAPTER  I 
INTRODUCTION 

Background 

The  past  decade  has  seen  a  phenominal  increase  in  the  thrust  towards 
numerical  simulation  of  complex  three  dimensional  flows  associated  with  real¬ 
istic  configurations.  Computational  flow  simulation  involves  numerical  solu¬ 
tion  of  the  equations  representing  the  actual  physics.  The  numerical  methods 
for  solving  various  flow  equations  have  been  available  for  a  long  time.  Howev¬ 
er,  the  capability  to  handle  complex  realistic  configurations  has  been  paced 
mainly  by  the  ability  to  numerically  discretize  the  domain  of  interest  into  a 
mesh  of  points. 

Grid  generation  is  the  starting  point  of  any  computational  fluid  dynam¬ 
ics  simulation.  Creation  of  a  suitable  grid  over  a  complex  multiblock  dninflin 
has  always  been  one  of  tiie  most  time  consuming  aspects  in  flow  simulation. 
Grids  have  to  accurately  represent  the  geometry  imder  consideration.  Grid 
points  have  to  be  dense  enough  to  resolve  the  important  flow  features  since 
the  accuracy  of  a  numerical  algorithm  depends  heavily  on  the  grid  spacing  ( a 
sparse  gnd  in  a  region  of  high  flow  gradients  results  in  a  loss  of  important  flow 
features).  In  most  cases  sufScient  knowledge  of  the  flow  physics  is  not  known 
a  priori  to  increase  grid  point  density  in  the  important  regions.  Using  a  xmi- 
formly  dense  mesh  would  result  in  excessive  usage  of  computer  memoiy  and 
CPU  time.  Further,  such  high  resolution  is  required  in  a  veiy  small  area  of 
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the  flow  region.  In  most  cases  the  initial  grid  can  resolve  only  the  general  fea¬ 
tures  of  the  solution. 

Grid  adaptation  involves  distributing  the  grid  points  in  an  optimum 
fashion  to  give  higher  resolution  of  important  flow  features.  There  are  three 
distinct  approaches  to  adapting  grids:  redistribution,  refinement  /  derefine¬ 
ment,  and  local  increase  of  the  order  of  the  numerical  algorithm.  Although  the 
last  method  requires  no  change  in  the  grids,  it  requires  highly  complex  solu¬ 
tion  algorithms.  Mesh  refinement  involves  adding  points  in  the  region  of  high 
gradients  [281  [29].  This  method  guarantees  that  the  global  error  will  not  in¬ 
crease  since  no  depletion  of  points  occurs  in  other  regions.  However,  it  requires 
maintainace  of  complex  data  structures  to  allow  addition  of  points  anywhere 
in  the  region.  If  starting  from  a  uniform  grid.  Adaptive  Mesh  Refinement 
(AMR)  results  in  cells  that  are  more  or  less  isometric  (Aspect  Ratio  1 ).  This 
constraint  is  particularly  unattractive  in  case  of  boundary  layer  cells  requir¬ 
ing  high  aspect  ratio.  The  true  advantage  of  using  AMR  methods  lies  in  topolo¬ 
gies  involving  largely  varying  length  scales.  Astrophysics,  Ocean  modelling 
etc.  would  be  typical  examples  of  such  topologies. 

Redistribution  is  achieved  by  moving  grid  points  locally  or  globally  to 

tv  ■  ■ 

regions  of  higher  gradients.  This  method  does  not  increase  memory  or  CPU 
requirement  since  total  number  of  grid  points  remains  the  same.  Boundary 
layers  can  be  resolved  very  well  with  optimum  number  of  points.  Redistribu¬ 
tion  works  the  best  for  topolgies  that  do  not  contain  length  scales  that  differ  by 
many  orders  of  magnitude.  Most  aerospace  geometries  have  length  scales  that 
do  not  need  an  AMR  approach  to  get  the  best  efficiency.  Simplicity  of  solution 
algorithms  and  data  structures  required  is  a  mqjor  advantage  of  this  method. 


3 

Grid  redistribution  for  structured  grids  can  be  done  either  algebraically 
or  by  solving  partial  differential  equations.  The  algebraic  methods  are  very 
fast  but  can  result  in  highly  skewed  grids  in  complex  geometries.  Elliptic  par¬ 
tial  differential  equations  on  the  other  hand  are  much  slower  but  give  smooth 
and  close  to  orthogonal  grids  even  after  adaptation.  This  makes  these  methods 
highly  desirable. 


Overview  of  Existing  work 

Several  grid  adaptation  algorithms  for  structured  grids  are  available  at 
present.  These  algorithms  can  be  broadly  classified  into  two  types:  those  based 
on  algebraic  methods,  and  those  based  on  solution  of  PDEs.  SAGE  [91  and 
Yang’s  algorithm  [16]  are  examples  of  the  algebraic  grid  adaptation  methods, 
while  the  EAGLE  code  [51  and  Thornburg’s  work  [2]  are  based  on  solution  of 
elliptic  partial  differential  equations. 

SAGE  ( Self-Adaptive  Grid  Code)  is  based  on  the  algebraic  method  out¬ 
lined  by  Nakahashi  and  Deiwert  [8].  The  adaptive  grid  procedime  is  based  on 
grid— point  redistribution  through  local  error  minimization  using  the  spring 
analogy  form.  Davis  and  Venkatapathi  [9]  evolved  this  method  into  a  flexible 
tool  for  adapting  two-dimensional  and  three-dimensional  grids  with  multiple 
blocks.  Multidimensional  adaptation  is  split  into  a  series  of  one  dimensional 
adaptations  along  each  coordinate  line,  adaptation  is  done  line  by  line  and 
plane  by  plane  for  all  planes  specified.  A  weight  function  proportipnal  to  the 
derivative  of  a  flow  property  is  used.  Tbrsional  force  term  is  used  to  achieve 
orthogonality  and  smoothness  of  grid.  The  force  is  calculated  as  torsion  be¬ 
tween  adjacent  grid  lines.  Although  this  method  is  fast,  the  grids  generated  by 
this  method  are  not  always  smooth.  In  case  of  multiblock  grids,  the  code  trans¬ 
fers  the  shared  face  from  first  block  to  second.  The  common  face  is  evaluated 
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only  in  the  first  block.  This  method  would  achieve  C®  continuity  but  cannot, 
guarantee  slope  continuity.  Fiuther,  a  lot  of  user  input  is  required  for  using 
the  code  which  makes  it  difidcult  to  use  with  large  number  of  blocks. 

Yang  [16]  has  created  a  general  purpose  adaptive  grid  generation  sys¬ 
tem  based  on  algebraic  redistribution  of  points  using  NURBS.  A  weight  func¬ 
tion  evaluated  with  properly  weighted  boolean  sum  of  various  flow  filed  char¬ 
acteristics  is  used.  It  applies  inverse  NURBS  formulation  to  compute  NURBS 
control  points  for  the  surfaces.  This  maintains  geometric  fidelity  of  the 
adapted  grids.  The  multidimensional  grid  adaptation  is  split  into  a  series  of 
two-dimensional  grid  adaptations  along  one  of  the  computational  coordinate 
lines.  The  two-dimensional  algebraic  grids  are  smoothened  by  a  few  iterations 
of  an  elliptic  solver.  This  dilutes  the  adaptive  effect  and  does  not  guarantee 
smoothness  of  grid  lines  along  the  third  dimension.  Yang’s  algorithm  has  pro¬ 
duced  excellent  results  on  two  dimensional  grids  and  simple  three  dimension¬ 
al  regions.  There  is  no  capability  to  adapt  multi-block  grids. 

The  ElAGLE  code  is  a  composite  grid  generator  supporting  completely 
general  configurations  [5].  This  code  uses  elliptic  grid  generation  system  with 
automatic  evaluation  of  control  functions,  either  directly  fi'om  the  boundary  or 
fi*om  the  boundary  point  distributions.  Neumann  boundary  conditions  have 
been  provided  in  addition  to  control  functions  to  achieve  orthogonality  at  the 
boundaries.  Boundary  surfaces  are  represented  as  splines.  The  new  boimdary 
points  are  located  by  orthogonal  projection  of  adjacent  points  using  Newtons 
iteration.  Slope  continuity  across  blocks  is  guaranteed  by  forcing  the  grid  lines 
on  both  sides  to  be  orthogonal  to  the  shared  face.  The  EAGLE  adaptation  rou¬ 
tine  does  not  have  solution  interpolation  built  in.  Hence  grid  adaptation  must 
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be  done  either  at  every  time  step  or  the  PDE  solver  must  accoimt  for  grid 
speeds  [6]  [7]. 

Thornburg's  approach  [2]  also  involves  grid  adaptation  using  solution  of 
elliptic  differential  equations.  A  newly  developed  weight  function  emplo3dng 
Boolean  sums  is  utilized  to  represent  the  local  truncation  error.  A  NURBS  rep¬ 
resentation  is  employed  to  define  block  sxufaces  for  bounday  point  redistribu¬ 
tion.  A  coupled  strongly  implicit  procedure  (CSIP)  as  described  by  Ghia  et.al. 
[4]  has  been  implemented  for  the  solution  of  discretized  equations.  Upwind 
differencing,  with  biasing  based  on  the  sign  of  the  forcing  functions,  as  well  as 
central  differencing  has  been  implemented  for  the  derivative  terms.  The  first 
order  upwind  differencing  increases  the  stability  of  the  procedure.  A  hybrid 
upwind/central  differencing  scheme  has  been  implemented.  The  interpolation 
procedure  employed  was  the  one  used  for  the  non-matching  block  to  block  in¬ 
terface  capability  of  the  NPARC  code  [18].  The  code  shows  excellent  results  on 
resolving  various  flow  features.  However,  Neumann  boundary  conditions  can 
be  applied  only  if  the  user  supplies  the  NURBS  definitions  of  the  original  ge¬ 
ometry.  It  is  extremely  slow  and  there  is  no  multiblock  handling  capability. 
The  current  work  is  primarily  based  on  the  weights  developed  by  Thombmrgh 
and  Soni.  More  detailed  description  of  these  is  included  in  Chapter  11. 

Motivation 

Most  realistic  configurations  consist  of  complex  multiblock  topologies. 
The  adaptation  algorithm  thus  requires  to  have  the  capacity  to  handle  topolo¬ 
gies  with  any  kind  of  block  coimectivities.  Grid  lines  should  be  completely  con¬ 
tinuous  across  boundaries  without  forcing  orthogonality  to  achieve  slope  conti- 
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lypically  when  using  grid  adaptation,  the  solver  has  to  be  stopped  after 
a  certain  number  of  iterations  for  modifying  the  grids  using  a  grid  adaptation 
algorithm.  This  (ycle  is  repeated  a  few  times  until  complete  convergence  is 
achieved. 

Experience  with  a  realistic  missile  geometry  (  >  1  million  points  ) 
showed  that  adaptation  based  on  elliptic  grid  generation  yields  much  better 
results  for  a  flow  containing  vortices  as  well  as  shocks.  However,  the  CSIP 
’  grid  adaptation  algorithm  took  about  a  third  of  the  total  time  required  for 
computing  the  final  solution  ( order  of  10  hours  for  each  adaptation  cycle  on  a 
million  point  grid  ).  This  being  quite  a  significant  amoimt  of  time,  attention 
was  focused  towards  creation  of  faster  algorithms  for  grid  adaptation. 

Adapted  grids  get  skewed  near  body  surface  if  the  boundary  points  are 
not  moved  during  redistribution.  Dirichlet  boundary  conditions  keep  the 
boundary  points  fixed.  This  makes  it  necessary  to  design  an  algorithm  that 
allows  Neumann  boundary  conditions. 

Intermediate  solution  interpolation  is  essential  to  guarantee  that  grid 
points  are  concentrated  in  the  correct  regions.  Parallel  multiblock  grid 
adaptation  ftuther  complicates  the  solution  interpolation  proccidure.  Moving 
block  faces  require  each  block  to  query  other  blocks  for  points  that  have  moved 
outside  its  domain.  A  fast  parallel  multiblock  interpolation  routine  is  there¬ 
fore  essential  in  the  algorithm. 

Solution  speed  can  be  dramatically  increased  with  parallel  processing. 
The  growing  abimdance  of  relatively  cheap  high  performance  parallel  ma¬ 
chines  makes  it  more  practical  to  design  algorithms  that  aim  at  solving  volu¬ 
minous  problems  on  multiple  processors.  A  parallel  grid  adaptation  code  was 
hence  conceived. 


The  Parallel  Multiblock  Adaptive  Grid  generation  (PMAG)  algorithm 
has  been  designed  to  handle  a  general  multiblock  topology.  There  is  absolutely 
no  restriction  on  block  connectivity.  A  block  can  be  be  connected  to  any  block 
including  itself,  thus  supporting  a  wide  range  of  complex  three  dimensional 
topologies. 

The  algorithm  is  designed  to  adapt  a  multiblock  grid  concurrently  with 
each  block  solved  in  an  individual  process.  These  processes  can  be  run  on  a 
single  parallel  machine  or  distributed  over  a  network  of  workstations. 

MPI  is  used  as  the  message  passing  interface.  The  advantages  of  using 
MPI  are  that  it  has  the  capability  to  have  asynchronous  communications  to 
hide  communication  latency,  it  is  supposed  to  work  over  a  heterogenous  net¬ 
work,  and  it  is  now  a  standard.  [26]  [271 

Grid  adaptation  in  PMAG  is  based  on  the  boolean  sum  of  weight  func¬ 
tions  suggested  by  Thornburg,  et.al.  [2].  These  weight  functions  have  demos- 
trated  the  capacity  to  detect  shocks  of  differing  strengths,  primary  and  sec¬ 
ondary  vortices,  and  shear  layers  adequately.  Enhancements  have  been  made 
for  evaluating  globally  normalized  weights  for  multiblock  grids  as  suggested 
by  Thornburg. 

A  simple  tridiagonal  solver  is  used  for  generating  the  elliptic  grids.  The 
solver  is  much  faster  than  CSIP,  and  has  shown  no  major  disadavantage  in  the 
quality  of  grids  achieved. 

Neumann  boundaiy  conditions  have  been  incorporated  to  maintain  grid 
quality  near  body  surfaces.  PMAG  is  designed  to  allow  boimdary  point  move¬ 
ment  by  defining  the  boundaries  as  NUBBS  surfaces.  This  guaratees  that  the 
geometric  definition  is  preserved  accurately. 
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Solution  interpolation  is  based  on  the  method  suggested  by  Stokes  [20]. 
Several  enhancements  have  been  made  for  faster  and  more  robust  interpola¬ 
tion.  The  same  algorithm  has  been  extended  for  pareJlel  multiblock  solution 
interpolation. 

PMAG  has  been  designed  primarily  on  the  SGI  Power  Challenge  plat¬ 
forms.  It  has  the  capacity  to  do  concurrent  computing  at  two  levels.  Each 
MPI-process  can  further  spawn  threads  on  a  parallel  shared  memory  machine 
giving  finer  grained  parallelism.  The  ability  to  spawn  threads  helps  in  many 
ways.  Better  scalability,  a  simplistic  load  balancing  technique,  better  cache 
utilization  by  reducing  array  sizes  being  the  most  significant  of  the  advan¬ 
tages.  The  code  has  also  been  ported  to  Sxm  workstations  ( without  threads  ). 

Organization 

The  subsequent  chapters  take  a  detailed  look  at  each  of  the  aspects 
mentioned  here. 

Chapter  II  discusses  the  theoritical  development  of  the  basic  adaptation 
algorithm  based  on  solution  of  elliptic  partial  differential  equations.  This  in¬ 
cludes  discussion  of  the  geometric  and  solution  adaptive  controls,  implementa¬ 
tion  of  NURBS  and  a  fast  solution  interpolation  algorithm.  Most  of  these  top¬ 
ics  are  based  on  earlier  work  by  others,  with  a  few  modifications  to  improve 
performance. 

Chapter  HI  discusses  the  mqjor  issues  involved  in  implementing  the 
code  in  a  parallel  paradigm.  Multiblock  issues  such  as  strategies  for  shared 
face  evaluation,  inter-block  communication,  and  solution  interpolation  are 
discussed  in  detail.  Implementation  of  multi— threading  for  shared  memory 
machines  (SGI  Power  Challenge  series)  is  also  discussed.  All  the  strategies 
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and  algorithms  dicussed  in  this  chapter  are  original  contributions  of  the  au¬ 
thor. 

Chapter  IV  includes  the  results  and  discussion  on  various  test  cases. 
Conclusions  and  areas  of  possible  future  work  have  also  been  suggested. 

A  users  guide  has  been  provided  for  reference  in  Appendix  B. 


CHAPTER  n 

THEORITICAL  DEVELOPMENT  FOR  GRID  ADAPTATION 

Introduction 

Traditionally,  smooth  structured  grids  have  been  created  as  solutions  of 
a  set  of  partial  differential  equations.  The  PDEs  can  be  either  parabolic,  hy¬ 
perbolic  or  elliptic.  Elhptic  equations  are  the  most  stable  and  have  a  smooth¬ 
ing  effect  on  the  grids.  In  most  cases  this  results  in  smooth,  near  orthogonal 
grids  even  if  the  boundary  has  sharp  comers.  This  is  desirable  for  most  flow 
solvers.  The  control  functions  in  the  Poisson  equations  make  it  possible  to 
cluster  the  grids  in  region  of  interest,  and  thus  provide  an  elegant  way  to  re¬ 
distribute  grids  to  resolve  important  flow  features. 

The  Elliptic  System  of  Equations 

A  grid  is  smooth  if  it  is  second  order  continuous.  Mathematically  if  q, 
g  are  curvilinear  coordinates  along  the  grid  lines,  then  the  solution  to  the 
Poisson  equations 

vh)  =  e 

=  6  (2.1) 

should  give  smootii  grids  as  long  as  the  flinctions  4>>  6  and  5  are  smooth  and 
continuous.  Structured  elliptic  grid  generation  involves  generation  of  grids 
that  satisfy  the  above  equations.  Tb  be  usable  for  generating  cartesian  coordi- 
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nates  X,  Y  and  Z,  we  must  first  cast  the  equations  to  have  X,  Y  and  Z  as  the 
dependent  variables.  This  leads  to  a  system  of  equation  of  the  type, 


3  3 

t=ij=i 


+ 


3 
k=l 


kk 


=  0 


Where: 


(2.2) 


ry  :  Position  vector, 

gy  :  Contravariant  metric  tensor, 

I*  :  Curvilinear  coordinate,  and 
Pjj  :  Control  Function. 

Once  cast  in  this  form,  the  differential  equations  are  discretized  using 
central  difference  form.  Discretized  equations  give  a  system  of  linear  equa¬ 
tions  that  can  be  cast  in  a  tiidiagonal  form.  Solution  is  obtained  by  iteratively 
solving  the  tridiagonal  system  of  equations.  The  details  of  this  method  are  dis¬ 
cussed  in  Appendix  A. 


Geometric  Control  Functions 

Laplace  equations,  with  tiie  fordng  functions  identically  equal  to  zero 
tend  to  create  eqm-distributed  grids.  However  in  case  of  viscous  flows  closely 
packed  grid  lines  are  critical  for  resolving  the  boundary  layer.  Further,  La¬ 
place  grids  also  move  grid  lines  away  firom  the  concave  boundaries,  and  to¬ 
wards  the  convex  boimdaries.  This  behaviour  is  highly  undesirable  and  hence 
forcing  functions  are  used  that  preserve  grid  line  spacing  and  orthogonality 
near  the  body  surface. 

Grid  line  spacing  would  be  maintained  if  forcing  functions  were  derived 
by  solving  the  equation  (2.2)  for  Pk>  Various  methods  of  evaluating  control 
functions  have  been  investigated  [11].  All  these  methods  assume  orthogonality 
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of  grid  lines  as  a  desirable  property.  The  method  discussed  by  Soni  [11  is  fast 
and  easy  to  implement. 

Consider  the  three  dimensioned  elliptic  grid  generation  system  : 

3  3  3 

S  =  0  (2.3) 

t=lj=l  k=l 

In  order  to  evaluate  the  forcing  functions  Pk,  k=l,2,3  the  usual  practice 
is  to  take  a  dot  product  of  the  equation  (2.3)  with  ,  q=l,2,3  and  write  the 
equation  as : 

3  3  3 

E  +  E  =  0  (2.4) 

*=i 

Using  the  definitions  fi’om  the  earlier  section,  we  can  rewrite  the  equa¬ 
tion  using  only  the  metric  terms  and  their  derivatives, 

3  3  3 

i=U=l  *=1 

-  i  ie*’  =0  «.6) 

t=lj=l 

If  grid  lines  are  assmned  to  be  orthogonal,  g^-  =  0  if  ( t  ).  Equation 
(2.5),  can  then  be  simplified  as  : 

3 

?{<{,  +  •  Ff.  )  =  0  q  =1^,3 

i«=l 

"  “  2  ^  ('“^)  ■ 

The  same  formula  can  also  be  expressed  as  shown  in  equation  (2.8) 


(2.6) 

(2.7) 
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2'  ^11  ^22  ^33  ^ 


1  .^22)g  _  ^33^  _  ^11 V 

2^  ^^22  ^33  ^11  ^ 


1  .^33)^  _  ^ll^S  _  ^22^i. 
2^  ^33  ^11  ^22  ' 


(2.8) 


This  is  the  form  used  in  tiie  algorithm  for  computing  the  geometric 
weight  functions.  The  metrics  gu,  g22,  g33  are  already  available,  and  hence 
the  evaluation  of  forcing  functions  is  very  easy. 


Solution  Based  Adaptive  Control  Functions 

Solution  based  grid  adaptation  involves  clustering  grid  points  in  the  re¬ 
gions  of  high  flow  gradients  to  achieve  better  resolution  of  flow  featmes,  and 
reduce  the  truncation  errors. 

Grids  can  be  viewed  as  a  collection  of  points  interliked  by  springs.  The 
elliptic  equations  are  obtained  by  minizing  energy  for  such  a  system  using  the 
variational  principle.  If  all  spring  constants  are  identical,  tiie  resulting  PDE  is 
the  Laplace  equation  and  the  solution  is  a  equidistributed  grid.  On  the  other 
hand,  desired  grid  point  distributions  can  be  obtained  by  varying  the  spring 
constants  appropriately. 

The  basic  idea  of  adaptive  redistribution  originates  from  the  principle  of 
equidistribution  of  error  [15].  The  point  spacing  must  be  inversely  proportion¬ 
al  to  some  measure  of  the  error  w  in  the  solution. 

w{  )  —  const  (2.9) 

The  weight  function  w  can  either  be  a  function  of  the  computational  coordi¬ 
nate  I  ( Spring  Analogy  form )  or  of  the  actual  physical  location  x  ( Point  Den- 
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sity  form ).  Since  the  solution  is  a  function  of  the  physical  coordinates,  it  would 
be  preferable  if  the  weight  function  were  derived  as  a  function  of  the  same. 
However,  since  solution  data  is  available  only  at  the  grid  points,  it  is  much 
easier  to  find  the  weight  function  at  grid  points.  Hence,  the  spring  analogy  ap¬ 
proach  is  followed.  However,  it  adds  an  overhead  of  having  to  interpolate/  re¬ 
calculate  the  weight  functions  sifter  every  few  iterations  as  the  points  move 
through  physical  space. 

The  formulation  of  such  weight  functions  that  accurately  reflect  the 
tnmcation  error  in  a  region  is  thus  the  most  critical  part  in  grid  adaptation. 
Derivatives  of  pressure  or  density  are  widely  used  in  contruction  of  weight 
functions.  For  a  given  physical  property  u  the  weights  are  typically  calculated 
using  a  formula  of  type.  [13]  [14]  [16]  [17] 

iy(^)  =  1+0  (2.10) 

Although  such  weights  have  proved  very  successful  in  resolving  inviscid  flow 
features  they  are  less  capable  in  resolving  viscous  features  such  as  boundary 
layer  and  vortices.  Further,  such  functions  require  the  user  to  input  relative 
weights  a,p.  For  a  multiblock  grid  adaptation  algorithm  aimed  at  complex  ge¬ 
ometries,  it  is  important  to  have  a  weight  function  that  needs  TninimiiTn  user 
input. 

The  weight  function  investigated  by  Thornburg  and  Soni  [2]  [3]  has 
proved  to  detect  flow  features  of  varying  strengths,  including  vortices,  bound¬ 
ary  layers  and  shocks  of  disparate  strengths  and  requires  almost  no  user  in¬ 
put.  This  feature  is  certainly  very  attractive  for  a  general  multiblock  grid 
adaptation  algorithm.  Relative  derivatives  are  used  so  that  weaker  but  impor¬ 
tant  features  such  as  vortices  are  also  reflected  accurately  m  the  weight  func- 
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Each  conservative  flow  variable  is  scaled  independently.  A  small  value 
epsilon  is  added  to  all  normalizing  derivatives,  primarily  to  handle  zero  veloc¬ 
ity  regions.  However,  this  epsilon  value  also  works  as  a  filter  as  described  lat¬ 
er.  A  boolean  sum  of  the  scaled  derivatives  is  used  to  compute  the  weight  func¬ 
tions  in  each  direction  which  are  then  normalized.  Weight  functions  in  each 
direction  are  then  added  with  another  boolean  sum  to  give  the  final  weight 
function. 


W  = 


max(WKW^,W^)  ®  max(WKW^,W^)  ®  max(WKW^,W^) 


W2 


e 


where, 
W*  =  1 


Addition  of  the  weights  in  different  directions  with  a  boolean  sum  as 
shown  in  Equation  (2.11)  is  an  important  unique  feature  of  tiiis  method. 
Figure  2.1  shows  a  comparison  of  grid  adaptation  with  and  without  a  boolean 
sum  of  the  three  weights.  The  boolean  sum  makes  the  grid  adaptation  uniform 
in  all  directions,  making  it  possible  to  resolve  circular  vortex  structures. 

The  control  functions  are  derived  from  the  weight  function  using  the 
following  formula: 
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Figure  2.1  Comparison  of  weights 


'  Wi 


(2.12) 


In  order  to  conserve  the  geometrical  characteristics  of  the  original  grid, 
ilie  control  functions  used  for  the  elliptic  solver  are  formulated  as  a  combina¬ 
tion  of  the  geometric  control  fimctions  and  the  adaptive  control  functions 


[161. 


•P  ”  OgeomP geom  +  ^adajp adap  (2.13) 


Figure  2.2  shows  the  weight  function  detecting  a  vortex  behind  a  mis¬ 
sile  at  14  degree  angle  of  attack. 
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The  pniflll  value  epsilon  used  in  the  normalizing  derivatives  also  helps 
in  filtering  out  small  disturbances  in  the  flow  field  that  cause  the  grid  to  be 
concentrated  in  the  non-critical  regions.  Increasing  the  value  of  epsilon  re¬ 
duces  the  sensitivity  of  the  weight  function  to  small  disturbances  in  the  flow 
field.  Figure  2.3  illustrates  the  effect  of  increasing  the  value  of  epsilon  on  the 
weight  function. 

Boundary  point  movement  using  NURBS  patches 
Strong  grid  adaptation  near  body  surfaces  leads  to  highly  skewed  grids 
if  the  boimdaiy  points  are  not  moved.  The  most  important  concern  in  moving 
boundary  points  is  preserving  geometric  fidelity.  As  shown  in  Figure  2.4  sim¬ 
ple  linear  interpolation  results  in  the  loss  of  geometry  information  that  is 
highly  undesirable. 


•  :  Original  points  on  curve 

X  :  Lanearly  interpolated  points 

Figure  2.4  Loss  of  Geometric  information 


Boundary  point  movement  without  loss  of  geometric  details  can  be  ac¬ 
complished  by  using  NURBS  definition  of  the  body  surfaces.  The  original  body 
surfaces  are  used  to  create  NURBS  definitions  using  inverse  NURBS  formula¬ 
tion.  Boundary  point  movement  is  achieved  by  reparameterization  of  the  sur¬ 
face  to  preserve  slope  continuity  or  orihogonality  at  the  boundaries  as  re¬ 
quired.  In  this  section  the  details  of  the  implementation  of  Neumarm 


19 

boundary  condition  will  be  discussed.  A  quick  overview  about  NURBS  and  a 
discussion  of  the  terminology  is  presented  in  the  next  sub-section. 

Background  on  NURBS 

When  creating  grids  for  real  world  configurations,  the  curves  and  sur¬ 
faces  of  the  body  are  given  either  in  discretized  form  or  as  parametrized  repre¬ 
sentations  using  Bezier  siirfaces,  B-Splines  or  NURBS.  Non-Uniform  Ratio¬ 
nal  B-Splines  referred  to  as  NURBS,  have  become  standard  tools  for 
representing  and  designing  fi*ee-form  as  well  as  standard  anal3rtic  shapes  of 
Computer  Aided  Geometric  Design  (CAGD).  A  few  reasons  for  popularity  and 
widespread  use  of  NURBS  are  that  they  are  true  generalizations  of  nonration- 
al  B-Spline  forms  as  well  as  rational  and  nonrational  Bezier  curves  and  sur¬ 
faces.  They  offer  a  common  mathematical  form  for  representing  both  analyti¬ 
cal  shapes  as  well  as  freeform  shapes.  Evaluation  of  NURBS  is  reasonably 
fast,  and  computationally  stable.  NURBS  are  invariant  under  scaling,  rota¬ 
tion,  translation  and  sheer  as  well  as  parallel  and  perspective  projection. 
[22][231 

NURBS  give  a  parametric  definition  of  any  geometric  entity  in  terms  of 
control  polygons,  knot  vectors  wd  associated  weights.  The  order  of  a  NURBS 
dictates  the  order  of  the  B-Spline  curves  used  for  representing  the  geometric 
entity.[24] 

Analytically  ,  NURBS  surface  is  defined  by  the  equation  (2^14),  where, 
dij  is  the  control  mesh  for  the  surfeice,  Wjj  are  the  weights  associated  with  each 
point  on  the  control  mesh,  {  Uq  ..Um+k  )  {  Vq  J  are  the  knot  vectors 

along  each  coordinate  direction  and  N\(u)  are  the  normalized  B-Spline  basis 
fimctionals. 
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m  n 

^u,v)  - 

^  '  m  n 

i=0j=0 


tt  e  [«*_!, 

ve  [V;_i,v„+i]  (2.14) 


The  normalized  B-spline  basis  functionals  are  evaluated  as, 


*  “»+*-!  “i  *  “i+A  “t+1  *  +  l 


1  ^  ri,ifttf  ^  a  ^  Mf  +  l] 
\Q,otherwise  | 


(2.15) 


The  De  Boor  algorithm  is  used  to  evaluate  the  physical  coordinates  of 
the  surface.  [22] 


Implementation  of  Neumann  Boundary  Condition 
Nemnann  boimdary  condition  can  be  implemented  by  allowing  the 
boimdary  points  to  move  such  tiiat  the  slope  at  the  boundary  is  maintained. 
For  resolving  viscous  layers,  orthogonality  of  grid  lines  at  the  body  surface  is  a 
highly  desirable  feature. 

’  Boundary  layers  typically  show  dense  grid  packing.  It  is  qiaite  practical 
to  assume  that  the  grid  plane  next  to  the  boundary  would  follow  the  contour  of 
the  boundary  fairly  accurately.  This  means,  orthogonality  of  grid  lines  can  be 
easily  achieved,  if  the  boundary  has  a  point  distribution  identical  to  that  on  its 
neighboiuing  plane.  Thus,  the  algorithm  designed  to  achieve  orthogonality 
moves  points  on  the  boundary  such  that  the  point  distribution  coincides  to  its 
immediate  neighbour. 

Grid  points  defme  all  the  surfaces  with  discrete  data.  The  projection 
and  inversion  algorithm  created  by  Yu  [24]  creates  NURBS  definitions  of  these 
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surfaces.  This  routine  returns  a  set  of  control  points  dy,  weight  functions  Wy  , 
and  the  knot  vectors  { Uo  1  aiid  (  Vq  ..Vn+i }.  These  parameters  are  then 

used  to  remesh  NURBS  surfaces,  to  achieve  orthogonalily  at  the  boundaries. 

For  preserving  geometric  defmition  of  the  body,  grid  points  are  moved 
only  by  moving  them  in  the  distribution  space.  The  new  distribution  mesh  is 
then  remapped  into  the  physical  space  using  the  NURBS  definitions  created 
earlier.  For  orthogonality,  the  distribution  mesh  of  the  boundary  smface  must 
be  exactly  identical  to  that  of  its  neighbour.  The  distribution  space  of  NURBS 
is  not  related  to  the  arclength  distribution  space.  It  depends  on  the  position  of 
knot  vectors,  control  polygons  and  the  weights  ( Figure  2.5  ).  The  (s,t)  values 

Wi=10  W2=l 

physical  space  control  points  distribution  space 

Figure  2.6  Distribution  Space  for  NURBS 

for  each  point  on  the  surface  have  to  be  calculated  from  the  NURBS  definition 
using  Newtons  method  [24].  This  requires  us  to  create  a  NURBS  definition  of 
the  smface  neighbouring  the  boundary.  However  we  already  assmned  that 
the  neighbouring  plane  must  have  a  geometry  almost  identical  to  that  of  the 
boundary.  The  principle  of  invariance  of  NURBS  dictates  that  if  one  surface 
can  be  mapped  into  another  by  a  simple  linear  transformation,  then  they  have 
the  same  weights  and  knot  vectors.  Hence,  we  only  need  a  new  set  of  control 
points  to  be  computed  for  the  face  next  to  the  boundary.  Using  that  and  the 
weights  and  knots  of  the  boundary  face,  an  algorithm  computes  the  (s,t)  dis¬ 
tribution  for  the  neig^ouring  surface.  This  distribution  is  used  to  remesh  the 
boundary  surface  with  its  own  set  of  control  points. 


22 

If  instead  of  orthogonality,  only  slope  continmty  is  needed  at  the 
NURBS  boundary,  distribution  spaces  of  two  neighbouring  planes  are  calcu¬ 
lated  instead  of  one,  as  described  above.  The  distribution  space  for  the  bound¬ 
ary  is  then  calculated  by  extrapolating  the  slopes  from  the  earlier  planes. 

The  surface  remeshing  subroutine  can  be  called  after  every  few  itera¬ 
tions  to  adjust  boundary  points. 

Search  and  Interpolation 

The  weight  ftmction  controlling  grid  point  movement  is  computed  from 
the  solution.  Since  we  follow  the  spring  analogy  form,  the  weight  distribution 
gets  distorted  as  the  points  move.  It  is  therefore  neccessary  to  interpolate  tibe 
solution  and  weights  from  the  original  mesh  onto  the  new  mesh  after  every 
few  iterations.  This  requires  a  good  search  and  interpolation  algorithm  inte¬ 
grated  wifri  the  adaptive  algorithm  to  guarantee  accurate  clustering. 
Figure  2.6  below  compares  grid  adaptation  with  and  without  interpolation. 
VWthout  interpolation,  the  grids  tend  to  keep  moving  towards  the  center.  Also, 
the  clustering  near  the  feeder  of  the  vortex  moves  away  from  the  actual  region 
of interest. 

3  Solution  interpolation  is  also  important  to  transfer  the  solution  ob¬ 
tained  on  an  unadapted  mesh  to  the  new  adapted  mesh.  Solution  interpolation 
can  be  very  time  consiunii^,  cmd  hence  a  fast  method  for  accurate  solution  in¬ 
terpolation  is  required  for  best  results. 

Several  search  and  interpolation  algorithms  have  been  investigated  in 
the  literature  [19].  A  standard  search  algorithm  to  get  the  location  of  random 
points  on  a  grid,  involves  sequential  or  esaustive  search.  This  search  moves 
systematically  through  the  entire  grid,  scanning  each  cell  until  it  finds  the  cor¬ 
rect  cell  containing  the  point.  However,  a  sequential  search  is  very  inefiScient 
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Figure  2.6  Effect  of  Solution  Interpolation 


and  esqiensive  since  it  does  not  use  any  grid  structure  information.  A  search  of 
this  kind  takes  more  time  than  the  tridiagonal  solution. 

Two  fast  interpolation  schemes  have  been  investigated  by  Mastin  [19]  . 
The  first  algorithm  is  based  on  distance  calculation.  Suppose  a  point  Q  has  to 
be  searched  on  a  gnd  starting  at  vertex  P  of  a  cell.  The  algorithm  computes  the 
distance  of  Q  &om  the  neighbours  closest  to  F,  and  moves  to  the  neighbour 
that  is  closer  to  Q  than  P.  This  step  is  repeated  untill  a  vertex  R  is  found  that 
is  closer  to  the  Q  than  any  other  vertex  This  is  a  simple  algorithm,  which  is 
definitely  faster  than  the  sequential  search.  However,  the  paper  notes  that 
this  algorithm  does  not  guarantee  finding  the  closest  point  on  highly  skewed 
grids.  The  algorithm  also  does  not  have  the  abiliiy  to  step  diagonally. 
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Figure  2.7  Mastin’s  Interpolation  Scheme  I 

In  case  of  skewed  grids,  Mastin  suggests  an  alternative  method  in 
which,  given  a  starting  cell  C,  the  algorithm  determines  the  cell  edge  such 
that  the  line  L  along  that  edge  separates  the  point  Q  from  the  rest  of  the  cell. 
The  cell  D  on  the  other  side  of  L  is  then  taken  as  the  next  approximation  and 
the  algorithm  continues  untill  Q  is  inside  the  cell  C.  This  algorithm  is  fairly 
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Figure  2.8  Mastins  Interpolation  Scheme  II 


robust  even  on  skewed  grids.  However,  in  a  three  dimensional  grid,  the  cell 
faces  can  be  non-planar.  The  paper  notes  that  due  to  this,  extension  of  this 
algorithm  to  3  dimensional  grids  does  not  guarantee  that  the  algorithm  will 
terminate  at  a  cell  C  which  contains  the  point  P. 

The  best  suited  and  the  fastest  algorithm  used  in  this  code  takes  advan¬ 
tage  of  the  local  physical  properties  by  mapping  the  global  geometry  in  terms 
of  local  coordinates  a,  p,  y  of  the  current  cell  [20].  The  interpolation  point  ^  is 
expressed  in  terms  of  these  local  coordinates  which  fall  within  a  known  inter¬ 
val  (-1  to  1 )  if  ^  lies  within  the  cell.  If  any  of  the  coordinates  fall  outside  this 
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interval,  then  next  cell  is  guessed  based  on  tiie  value  of  those  local  coordi¬ 
nates.  The  next  guess  cell  can  be  diagonal  to  the  current  cell  if  more  than  one 
coordinates  falls  outside  the  interveil.  Further  if  any  of  the  coordinate  is  ex¬ 
tremely  large  (  say  a  »  2  )  then  the  algorithm  can  intelligently  jump  more 
than  one  cell  in  the  appropriate  direction.  If  we  assume  the  nearby  cells  to 
have  approximately  similar  aspect  ratio  as  the  current  cell,  then  an  a  =  20 
would  mean  that  the  point  Xp  lies  approximately  10  cells  away.  This  method 
thus  dramatically  speeds  up  the  search  process. 

The  coordinates  a,  P,  y  for  the  point  ^  are  obtained  by  solving  a  set  of  3 
linear  equations  in  terms  of  the  8  cell  vertices, 

8 

^  (2.16) 
J=1 


where, 

=(l-a)(l-/8)(l-y)/8 
Ag  =(l  +  a)(l+/8)(l-y)/8 
Ag  =(l-a)(l-/5)(l+y)/8 
A7  =a  +  dKl  +  fi)a  +  y)/8 


Ag  =a+a)a-fi)(l-Y)/S 
A4  =(l-a)(l+/8)(l-y)/8 
Ag  =(l-a)(l+)S)(l+y)/8 
A4  =(H-a)(l-/8)(l-t-y)/8 


Equation  (2.16)  can  be  solved  directly  by  inverting  using  Cramer’s  Rule. 
However,  if  the  point  is  outside  the  cell,  then  a  direct  solution  can  give  wrong 
direction  sense.  As  shown  in  the  Figure  2.9  ,  the  sense  of  direction  changes  for 
P  beyond  the  point  of  intersection  of  the  projection  of  the  two  edges.  A  direct 
solution  would  give  this  exact  value  aind  lead  to  a  wrong  next  cell  guess.  The 
paper  suggests  Newtons  method  be  used  instead  (Equation  (2.17) ).  The  first 
guess  of  Newtons  method  is  guaranteed  to  give  the  correct  directional  sense 
for  the  next  guess  cell. 
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Figure  2.9  Problem  with  direct  solution 

8 

J=1 

f 

=  ia,P,y)^  -  y  (2.17) 

If  the  initial  guess  is  too  far  from  the  location  of  the  algorithm  can 
get  stuck  in  local  minima  if  the  grid  is  highly  skewed.  In  the  current  applica¬ 
tion,  the  possibility  of  the  initial  guess  being  too  far  away  from  the  correct 
location  is  very  remote,  since  grid  adaptation  is  not  expected  to  move  the 
points  too  far  from  their  original  location.  Fm*ther  as  the  search  proceeds  the 
first  g^ess  of  a,  p,  y  for  Newton’s  iteration  in  <he  new  cell  is  made  intelligently 
from  the  values  of  the  coordinates  computed  in  the  earlier  cell.  In  terms  of  ihe 
local  coordinates,  the  width  of  each  cell  is  2.  Hence  going  to  the  next  cell 
should  change  the  value  of  the  associated  parameter  by  approximately  2.  This 
new  improvement  in  the  algorithm  also  greatly  improves  convergence  of  the 
Newton’s  iteration. 

However  in  case  of  multiblock  solution  interpolation,  it  is  still  possible 
that  the  first  guess  of  a  cell  location  is  far  fix)m  the  point  being  searched.  In 
order  to  care  of  such  cases,  the  original  algorithm  was  improved  by  adding  a 
random  jump  to  pull  out  file  algorithm  in  case  it  gets  stuck  in  such  local  mini- 
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ma.  The  random  jump  also  enables  the  new  algorithm  to  search  grids  with 
embedded  objects 

The  search  continues  in  this  manner  until  |  (x,  P,  y  |  <1  or  till  the  search 
hits  the  boundary  and  the  local  coordinates  indicate  that  the  point  lies  outside 
the  domain.  If  a  cell  containing  the  point  is  found,  the  local  coordinates  (X,  P,  y 
can  be  used  directly  as  weights  for  trilinear  interpolation  of  the  solution  from 
cell  vertices.  This  another  major  advantage  of  this  method.  Figure  2.10  shows 
the  solution  interpolated  to  a  2  block  adapted  grid.  The  contours  match  almost 
perfectly  validating  the  algorithm  used. 


Figure  2.10  Multiblock  Solution  Interpolation 

The  aforestated  discussion  about  tiie  search  and  interpolation  consid¬ 
ered  a  single  block  of  grid.  Additional  complexity  is  introduced  when  the  do¬ 
main  consists  of  multiple  blocks,  and  points  can  move  from  the  domain  of  one 
block  to  another,  llie  blocks  then  have  to  query  neighboiuing  blocks  for  solu- 
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tion  information.  This  aspect  is  discussed  in  more  detail  in  the  next  chapter  on 
parallel  multiblock  treatement. 


CHAPTER  III 

PARALLEL  MULTIBLOCK  GRID  ADAPTATION 


Background  on  Parallel  Computing 
Large  computational  tasks  can  be  done  much  faster  by  using  multiple 
processors.  This  chapter  discusses  various  issues  involved  in  parallel  multi¬ 
block  grid  generation.  The  current  section  gives  a  background  on  the  terminol¬ 
ogy  used  in  parallel  computing.  The  later  sections  discuss  particular  problems 
associated  with  concurrent  execution  of  the  grid  adaptation  code  and  discuss 
the  approach  taken  to  solve  them. 

For  a  given  level  of  technology,  parallel  computers  would  always  have 
higher  computational  potential  than  the  vector  computers.  This  comes  from  a 
fundament  rule  in  VLSI  complexity  theory  (The  AT^  result).  Given  an  area 
n^A  for  designing  a  computer,  the  one  with  slower  components  of  size  A 
each  is  potentially  n  times  faster  than  a  single  fast  computer  of  size  n^A,  How¬ 
ever,  to  exploit  this  potential  power,  the  algorithm  running  on  the  machine 
should  have  n  concurrent  tasks  that  are  fairly  decoupled.  In  most  practical 
cases,  these  tasks  are  not  completely  independent  and  need  to  communicate 
with  each  other  by  the  means  of  messages.  The  goal  in  designing  a  parallel 
algorithm  is  to  design  a  set  of  concurrent  tasks  that  can  operate  with  mini¬ 
mum  inter  communication,  without  losing  solution  accuracy  [25]. 

We  shall  now  take  a  brief  look  at  the  design  of  a  parallel  algorithm  fol¬ 
lowed  by  a  discussion  of  some  performance  metrics  and  general  terminology 
used  in  parallel  computing. 
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Design  Methodology 

Design  of  a  parallel  algorithm  can  be  done  systematically  in  four  stages 
Partitioning,  Communication,  Agglomeration  and  Mapping  [25].  In  the  first 
two  stages  attention  is  focused  on  discovering  the  best  algorithm  for 
concurrency  and  scalability  while  the  later  steps  involve  locality  and  perfor¬ 
mance  related  issues. 

Parititioning  involves  decomposition  of  the  problem  into  multiple  tasks. 
Parallel  algorithms  can  be  designed  using  functional  decomposition  or  domain 
decomposition.  Functional  decomposition  involves  concurrent  execution  of 
parts  of  an  algorithm  that  are  fairly  disjoint.  Domain  decomposition  on  the 
other  hand  involves  partitioning  of  the  data  into  smaller  domains.  Each  of  the 
multiple  tasks  work  on  their  particular  subdomain.  Multiblock  grids  give  a 
pre-partitioned  domain,  and  hence  the  domain  decomposition  approach  comes 
naturally  in  case  of  parallel  multiblock  grid  generation 

The  next  step  involved  in  the  design  of  a  parallel  algorithm  is  commu¬ 
nication.  Individual  tasks  need  to  communicate  the  data  required  for  com¬ 
putation.  In  our  case,  this  data  consists  of  points  near  the  block  interfaces, 
-solution  maximas  and  norms,  etc.  Communication  requirements  could  be  lo¬ 
cal/  global,  structured/  unstructured,  static/  dynamic,  and  synchronous/ 
asynchronous.  Allowing  a  general  multiblock  topology  means  we  can  have  ar¬ 
bitrary  graphs  representing  block  connectivities.  Hence  communication 
pattern  is  unstructured.  Exchange  of  face  information  is  thus  an  example  of 
local  unstructured  static  asynchronous  communication.  Caculation  of  norm  at 
the  end  of  each  iteration  on  the  other  hand  is  global  synchronous  communica¬ 
tion.  The  message  passing  interface  chosen  for  this  program  must  therefore 
support  as3mchronous  communications. 
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Agglomeration  looks  at  actual  organization  of  the  algorithm  with  atten¬ 
tion  focussed  on  increasing  algorithmic  efficiency.  This  would  include  grouping 
of  particular  tasks,  and  replication  of  the  data  and  associated  computation  to 
minimize  time  lost  in  communication.  In  a  one  dimensional  elliptic  solver, 
each  point  has  a  three  point  stencil.  This  means  that  a  point  i  needs  informa¬ 
tion  from  points  i-1  and  i+1  for  the  computation.  For  points  on  block  inter¬ 
faces,  this  would  mean  that  each  point  will  have  to  query  the  next  block  for 
information  about  its  neighbour.  This  communication  overhead  is  easily  re¬ 
moved  by  repheating  the  plane  adjacent  to  the  block  interface  belonging  to  the 
neibouring  block.  The  next  section  talks  about  this  in  greater  detail. 

The  final  step  of  mapping  considers  where  each  process  is  executed. 
Mapping  adjacent  blocks  on  adjacent  processors  can  give  higher  communica¬ 
tion  eflficien<y.  This  worild  require  knowledge  of  the  system  hardware  which  is 
beyond  the  scope  of  this  work.  However,  the  underlying  message  passing  in¬ 
terface  that  spawns  and  manages  the  processes  can  provide  such  functionahly 
(MPI)  [271.  Provision  has  been  made  in  the  code  to  utihze  that  feature  if  it  is 
provided.  The  algorithm  also  provides  an  option  for  the  user  to  manually  con¬ 
trol  the  processors  to  which  each  process  maps  on  SGI  machines. 

Performance  Metrics 

Performance  measurement  of  a  parallel  program  can  consist  of  a  vari¬ 
ety  of  considerations.  Execution  time,  memoiy  requirements,  parallel  efden- 
dency,  latency,  J/0  rates,  design  costs,  maintainance  costs,  portability,  hard¬ 
ware  requirements,  hardware  costs,  potential  for  reuse,  scalability  etc.  are 
just  to  name  a  few.  The  most  important  metrics  vaiy  for  each  application  [25]. 
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Minimization  of  execution  time  on  the  available  hardware  is  the  basic 
goal  of  this  algorithm.  This  can  be  achieved  if  the  code  makes  the  best  use  of 
available  computational  power. 

Efficiency  is  a  measure  of  the  fraction  of  time  that  processors  spend  in 
doing  useful  work.  It  characterizes  the  effectiveness  with  which  ein  algorithm 
uses  the  computational  resources  of  a  parallel  computer  in  a  way  that  is 
independent  of  problem  size.  Efficiency  can  be  formally  defined  as 

£  =  A 

^relative 

where  Ti  is  the  execution  time  on  one  processor  while  Tp  is  the  execu¬ 
tion  time  on  P  processors  [101125]. 

Speedup  gives  the  reduction  in  execution  time  achieved  by  running  on  P 
processors. 


C  =  ^  ^  =  pp 

^relative  T  ^^relative 


(3.2) 


In  case  of  the  parallel  multiblock  solver  discussed  here.  The  number  of 
processes  depend  on  the  number  of  blocks  in  the  grid.  Execution  time  on  a 
single  processor  is  thus,  impossible  to  find.  We  can  however  define  the  efficien¬ 
cy  metric  with  respect  to  the  execution  time  of  a  representative  sequential 
multiblock  grid  generator.  Table  4.5  in  Chapter  IV  compares  run  time  for  a 
five  block  grid  grid  with  that  using  GUMB. 

Scalability  is  an  important  performance  measure  that  describes  beha¬ 
viour  of  an  algorithm  with  changes  in  the  problem  size  or  the  computational 
power.  The  exact  definition  of  scalability  is  highly  debated  [10].  One  of  the 
most  widely  accepted  methods  is  to  define  scalability  in  the  sense  of  isoeffi- 
cien<y.  The  isoefficiency  function  is  defined  as  the  rate  at  which  the  problem 
size  should  grow  with  the  number  of  processors  to  maintain  a  fixed  effieciency. 
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PMAG  is  primarily  aimed  at  aerospace  applications.  The  test  cases 
available  have  a  fixed  problem  size  in  terms  of  number  of  grid  points  £ind  num¬ 
ber  of  blocks.  Thus,  obtaining  a  performance  metric  in  terms  of  isoefficiency  is 
also  veiy  difficult.  The  results  presented  in  Chapter  IV  compare  the  time  re¬ 
quired  to  adapt  grids  to  the  flow  aroimd  a  missile  using  different  number  of 
blocks.  There  is  also  a  comparison  of  the  times  required  to  run  6  blocks  on  dif¬ 
ferent  number  of  processors.  While  niether  of  these  give  a  precise  value  for  the 
scalability  of  the  algorithm,  they  can  be  viewed  as  qualitative  results  indicat¬ 
ing  the  potential. 

Parallel  Implementation 

With  that  background  on  Parallel  algorithm  design  methodology  and 
the  important  metrics  for  assesment  of  its  perfomance,  this  section  takes  a 
look  at  the  actual  implementation  details  of  the  parallel  grid  generation  algo¬ 
rithm. 

Each  block  has  to  be  solved  in  an  independent  processes.  PMAG  spawns 
processes  equal  to  the  number  of  blocks  in  the  topology.  Each  block  is  ejected 
to  be  stored  in  a  separate  disk  file.  'Hus  is  done  to  enable  concurrent  reading 
and  writting  of  grid  files  by  each  process.  The  user  needs  to  supply  the  connec¬ 
tivity  information  for  each  block.  This  includes  information  about  shared  faces 
and  fixed  patdies  (Figure  3.2  ).  The  grid  blocks  are  read  concurrently  by  all 
the  processes. 

Grid  lines  must  be  continuous  across  adjoining  blocks.  The  inteiv-block 
faces,  edges  and  vertices  which  do  not  describe  a  definite  fixed  body  must  be 
fi'ee  to  float  in  the  space.  This  requires  that  the  block  faces  be  solved  with  an 
exchange  of  information  across  the  block  faces.  A  strategy  of  replication  of 
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data  is  adopted  to  minimize  tiie  communication  which  is  expected  to  he  costli¬ 
er  them  the  extra  computation. 

Once  the  grid  is  read  in,  the  blocks  add  ghost  faces  to  accomodate  grid 
points  from  neighboring  blocks  to  medntain  continuity  across  shared  faces.  Ex¬ 
change  of  data  is  done  in  em  efficient  manner  by  the  use  of  asynchronous  (non- 
blocking)  communication  cedis. 

The  processor  first  posts  recieves  for  all  patches  expected  and  then 
packs  and  sends  the  patches  from  itself  to  its  neighbors.  In  the  meemwhile 
some  of  the  recieves  are  completed  in  the  background.  The  recieved  buffers  are 
then  copied  out  into  appropriate  buffers  as  the  calls  are  completed.  This  tech¬ 
nique  effectively  minimizes  the  time  lost  in  communication. 

Each  processor  then  computes  the  control  functions  and  runs  the  ellip¬ 
tic  solver.  Face  information  is  exchanged  at  the  end  of  every  iteration  of  the 
elliptic  solver  and  a  global  norm  is  computed  to  determine  the  convergence  cri¬ 
teria.  Solution  interpolation  and  boimdaiy  point  movement  using  NUEBS  is 
done  after  every  n  iterations  as  specified  by  the  user. 

It  might  be  noted  however  that  the  number  of  tasks  is  fixed  by  the  nmn- 
ber  of  blocks  in  the  grid.  If  the  algoriflim  is  restricted  to  this,  it  would  not  be 
scalable  in  terms  of  its  ability  to  solve  larger  problems  on  more  number  of  pro¬ 
cessors.  This  algorifiim  achieves  scalability  by  the  use  of  threads.  If  more  pro¬ 
cessors  are  available,  the  processes  subdivide  their  domain  by  unrolling  the 
outmost  loop  of  the  solver  and  the  search  algorithm.  Threads  are  spawned  to 
work  on  each  subdomain. 

Use  of  thr^^  is  advantageous  in  more  than  one  ways.  Scalability  is 
the  primary  advantage.  Maximum  use  of  available  resources  is  harnessed  by 
splitting  blocks  into  threads.  Controlling  number  of  threads  spawned  by  each 
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process  aids  in  load  balancing.  The  larger  blocks  can  spawn  more  threads  than 
the  smaller  blocks.  Splitting  the  domain  into  smaller  chimks  leads  to  better 
cache  performance. 


Blocks  Split 
into  Processes 


Figure  3.1  Splitting  a  block  using  threads 


The  disadvantage  of  this  approach  is  that  it  assumes  a  shared  memory 
machine  capable  of  creating  Light  Weight  Processes  (LWP).  In  case  the  code 
has  to  be  run  on  a  distributed  memory  machine,  or  a  network  of  workstations, 
then  a  preprocessor  splits  the  grid  into  appropriate  number  of  blocks  before 
spawning  the  grid  adaptation  processes  through  MPI. 

On  SGI  machines,  the  algorithm  allows  the  user  to  specify  the  proces¬ 
sor/processors  on  which  the  block  and  its  associated  threads  will  be  solved. 
This  feature  helps  in  load-balancing  in  case  the  number  of  processors  avail¬ 
able  are  less  tiian  flie  number  of  blocks.  The  user  can  specify  the  large  blocks 
to  be  solved  on  individual  processors  and  the  smaller  blocks  to  be  grouped  to¬ 
gether  to  share  the  rest  of  the  processors. 

The  sample  input  file  for  an  individual  block  shows  tiie  way  a  user  can 
specify  the  threads  to  run,  the  shared  faces,  fixed  boxmdaries,  and  NURBS 
surfaces 
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3 

1 

0  13  4 

0 

3 

1,1,29,1,64 

2,41,71,1,64 

3,1,70,62,97 

0 

0 

0 

0 

1 

1 


1,1,70,1,97 

1 


0,30,40,1,63 


0 

0 

0 

0 


Number  of  THREADS 
Specify  CPUs  ?  (1/0) 

List  of  CPUs  if  line  2  =  l 
Imin  has  0  shared  patches. 
Imax  2  has  3  shared  patches 
block, jmin, jmax, kmin, kmax 


Jmin  has  0  patches 
Jmax  has  0  patches 
Kmin  has  0  patches 
Kmax  has  0  patches 
Any  Fixed  Patches?  (1/0) 

0  Fixed  Patches  on  Tmin 
NURBS, jmin, jmax, kmin, kmax 
1  Fixed  Patch  on  Imax 
NURBS? ,  jmin,  jmax,  kmin,  kmax 
0  Fixed  Patches  on  Jmin 
0  Fixed  Patches  on  Jmax 
0  Fixed  Patches  on  Kmin 
0  Fixed  Patches  on  Kmax 


NOTE:  Faces  must  be  specified  in  cyclic  order,  for  example 
a  J=constant  face  is  specified  as  Icmin,  kmax,  imin,  imax 

Fig(ure  3.2  Sample  Block  Information  File 

The  next  figure  shows  the  flowchart  for  the  parallel  grid  adaptation  al¬ 
gorithm. 


Treatment  of  Shared  Faces 

One  of  the  most  important  factors  in  multiblock  grid  generation  is  to 
allow  movement  of  points  on  the  shared  block  faces.  These  blockfaces  in  space 
are  fictitious  surfaces  and  hence  should  not  be  imposed  as  fixed  boundaries. 
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Further  it  is  important  that  grid  lines  have  C®  and  continuity  across  the 
grid  faces. 

1b  guarantee  such  a  continuity,  at  the  end  of  each  iteration,  each  block 
sends  a  face  to  its  neighbouring  block  as  shown  in  Figiu*e  3.4  .  The  elliptic 
generator  can  then  run  using  the  face  from  the  neighboming  block  as  the 
dirichlet  boundary.  After  every  iteration  the  new  updated  faces  are 
transmitted  to  the  neighboring  block.  This  way  ,  the  face  points  are  solved 
■using  the  elliptic  equations  at  every  step.  Each  block  solves  the  for  the  face 
points  separately.  Hence  it  is  possible  that  the  face  points  may  not  coincide 
after  an  iteration. 

1b  get  rid  of  this  slight  discontinuity,  some  sequential  multiblock  codes 
keep  a  track  of  faces,  edges  and  vertices  belonging  to  each  block.  Cormectivity 
tables  are  maintained  to  identify  shared  vertices  edges  and  faces.  At  the  end  of 
every  iteration,  these  cormectivity  tables  are  used  to  pick  out  the  copies  of  the 
common  face,edge  or  vertex  and  then  an  averaged  (or  elliptically  solved )  val¬ 
ue  is  broadcast  to  all  the  owner's.  Another  method  especially  amenable  in  C 
codes  uses  pointers  to  faces,  so  that  only  a  single  copy  of  a  shared  face  is 
maintained,  with  both  blocks  pointing  to  the  same  memory  location. 

Clearly  the  later  strategy  carmot  be  used  when  the  blocks  are  being 
solved  on  different  processors.  The  first  strategy  of  collecting  all  copies  and 
then  broadcasting  an  average  can  work  in  a  parallel  implementation  but  it 
means  a  lot  of  communications.  (8  vertices  12  edges  and  6  faces  per  block). 

Assume  a  simple  structured  topology.  Each  face  can  be  shared  by  only 
one  otiier  block.  One  edge  is  shared  by  4  blocks  and  a  vertex  is  shared  by  8 
blocks.  In  such  a  case,  for  getting  unique  face,edge  and  vertex  locations  Each 
block  needs  to  communicate  with  26  neighbouring  blocks! 
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Using  the  above  strategy  would  require  a  master  process  to  maintain 
the  entire  connectivity  information  .  This  process  would  then  coordinate  the 
communications  and  averaging  of  faces.  Such  an  algorithm  would  essentially 
consist  of  large  sequential  part  when  the  master  coordinates  the  communica¬ 
tions.  Further  complicated  connectivity  tables  will  have  to  be  maintained. 
Hence  attention  was  diverted  to  invistigating  an  alternative  algorithm  which 
would  substantially  reduce  the  complexity. 

It  was  argued  that  since  elliptic  system  has  a  3  point  stencil,  and  since 
both  the  blocks  have  identical  copies  of  all  the  three  faces  required  for  com¬ 
putation  of  the  block  boundary  face,  the  face  points  calculated  by  the  each 
block  using  the  elliptic  system  should  be  the  same. 

However,  problems  arise  if  the  grid  generation  algorithm  uses  a 
Ixidiagonal  solver  to  compute  the  new  location  of  the  points.  Since  the 
tridiagonal  solver  consists  of  a  forward  and  backward  sweep  along  a 
coordinate  line  (say  the  I  direction).  The  new  location  of  any  point  (ij,k)  is  no 
longer  dependent  only  on  (i— lj,k)  and  (i+lj,k).  This  leads  to  major 
discontinuily  of  lines  on  the  1=  TninimTim  and  I  =  ma^dmum  boimdaiy  faces.  ( J 
and  K  faces  will  not  haye  a  problem ) 

A  different  approach  is  taken  to  guarantee  that  this  does  not  happen. 
While  the  block  interior  is  solved  using  the  tridiagonal  system  (since  it  gives 
faster  convergence),  the  faces  are  solved  individually  using  Point  Jacobi  Itera¬ 
tion.  The  control  functions  for  the  points  on  the  shared  face  are  also  evaluated 
using  only  a  3  point  stencil  guaranteeing  that  neighboring  blocks  compute  ex¬ 
actly  identical  locations  of  the  shared  points . 

This  approach  is  in  essense  similar  to  the  averaging  approach  described 
above.  However,  in  this  strategy  each  block  computes  the  shared  faces,  instead 
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of  a  single  processes  computing  the  shared  points  and  then  broadcasting  them 
to  all  owners.  This  strategy  avoids  communication  by  doing  redundant  com¬ 
putation  on  each  processor. 

In  the  strategy  described  above,  one  block  needs  to  communicate  twice 
(see  next  section.)  with  only  6  blocks,  thus  requiring  12  communications.  So 
even  for  a  simple  topology  this  strategy  cuts  down  the  communication  by  one 
half.  The  advantage  of  reducing  communication  becomes  even  more  important 
for  a  general  topology  with  N  blocks  sharing  a  face,edge  or  vertex. 

Simplicity  of  the  entire  scheme  is  a  major  advantage.  No  complicated 
global  connectivity  tables  need  to  be  maintained.  This  makes  the  code  ex¬ 
tremely  flexible,  enabling  it  to  handle  a  wide  variety  of  block  topologies,  in¬ 
cluding  0  grids  embeded  in  H  grids.  Periodic  boundary  conditions  etc. 

For  the  above  strategy  to  work,  the  basic  premise  is  that  all  blocks  shar¬ 
ing  a  particular  point  must  have  identical  copies  of  its  complete  stencil.  For 
each  block,  the  user  specifies  the  blocks  that  share  a  face  with  it.  This  means 
that  a  block  has  no  information  about  its  diagonally  opposite  neighbour.  A 
communication  strategy  that  allows  each  block  to  acquire  information  from  its 
diagonal  neighbours  is  described  in  the  next  section. 

Block-block  communication  of  shared  faces 

The  approach  described  above  requires  that  a  block  knows  tiie  extra 
faces  entirely.  If  blocks  send/recv  only  their  own  faces,  then  it  gives  rise  to  a 
problem  as  shown  in  the  figure  for  a  2-D  case. 

To  overcome  this  problem,  the  blocks  communicate  faces  inclusive  of  the 
extra  points  recieved  fi*om  the  nieghboming  blocks  (  shown  by  the  dashed- 
blocks).  This  however  means  tiiat  the  block/blocks  that  send  the  points  before 
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recieving  the  faces  from  the  nieghbors  would  send  out  void/old  points  to  the 
neighbours 

One  way  to  over  come  this  problem  is  to  have  the  commimication  go  in  a 
cychc  loop.  So  that  each  face  communicates  its  information  only  after  it  has 
recieved  the  relevant  information  from  its  other  neighbours.  This  however 
means  that  the  entire  communication  will  be  sequential  and  hence  a  great  in¬ 
crease  in  the  communication  time.  Further  the  blocks  taking  part  in  a  commu¬ 
nication  need  to  know  all  the  neighbours.  So  its  necessaiy  to  store  the  connec¬ 
tivity  information  of  the  all  the  blocks.  This  defeats  one  of  the  major 
advantages  gained  in  the  face  treatment  strategy. 

The  problem  can  be  effectively  overcome  if  we  do  the  same  communica¬ 
tion  one  more  time  to  fill  up  the  voids.  This  strategy  allows  us  to  use  asynch¬ 
ronous  communication,  so  that  more  than  just  2  blocks  communicate  at  any 
instant.  Further  no  additional  connectivity  information  needs  to  be  stored. 
Each  block  needs  to  only  know  the  nieghbours  that  share  its  faces.  Note  that 
doing  the  entire  communication  twice  does  involve  trasmission  of  some  redun¬ 
dant  information.  This  could  be  avoided  by  sending  only  the  edges  and  vertices 
In  the  second  communication.  This  has  feature  has  not  been  implemented  in 
PMAGyet. 

This  strategy  described  in  this  section  is  unique  to  this  work.  The  next 
section  discusses  solution  interpolation  algorithm  for  multiple  blocks  running 
concurrently. 

Parallel  Multiblock  Solution  Interpolation 

The  solution  adaptive  grid  generation  process  involves  moving  grid 
points  according  to  the  solution  gradients.  However  as  the  points  are  moved 
the  original  solution  needs  to  be  interpolated  on  to  the  new  grid.  A  parallel 
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gird  adaptation  algorithm  supporting  general  multiblock  topologies  makes 
solution  interpolation  significantly  more  complex.  The  grids  points  can  move 
outside  the  original  domain  of  a  block .  In  such  a  case,  the  block  does  not  have 
enough  information  to  interpolate  the  solution  and  adaptive  functions  to  all  its 
points.  Each  block  now  needs  to  query  all  other  blocks  for  the  points  that  are 
no  longer  within  its  own  domain. 

The  code  presented  has  been  designed  to  required  minimal  dependency 
on  topological  information.  Since  each  block  knows  only  its  immediate  neigh¬ 
bours  that  share  a  face,  it  is  impossible  to  intelligently  query  other  blocks  for 
solution  information.  (The  point  may  have  moved  to  a  diagonally  opposite 
block  of  which  the  current  block  has  no  information). 

The  strategy  in  this  case  is  to  involve  all  the  blocks  in  the  query.  The 
information  can  be  queried  fi*om  the  other  blocks  by  either  passing  informa¬ 
tion  of  extra  points  thru  the  list  of  blocks  in  a  circle  (Shift),  or  by  concatenating 
all  the  points  into  an  array  (Allgather)  and  then  doing  an  Allreduce  over  all 
the  blocks  to  get  the  solutions.  The  Shift  operation  takes  P  steps  to  complete. 
While  the  Allreduce  takes  4*log  P  steps  to  complete.  As  a  result,  for  number  of 
processes  P  <  16,  the  shift  is  faster  than  the  all  reduce.  Ideally  a  polyalgorithm 
should  be  used  to  switch  methods  according  to  the  process  group  size.  The  cur¬ 
rent  implementation  however  uses  only  the  Allreduce  method  for  the  queries. 

Figure  3.6  shows  the  algorithm  used  for  parallel  multiblock  search 
and  interpolation.  The  algorithm  can  be  made  scalable  by  splitting  each  of  the 
for  loops  over  multiple  processors  using  threads. 

The  next  chapter  discusses  the  results  obtained  on  several  test  cases.  It 
demonstrates  the  capability  of  the  algorithm  to  handle  diverse  configurations, 
and  lists  its  relative  merits/  demerits  over  other  similar  packages. 
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For  each  point  in  new  grid 

{ 

Search  point  on  old  grid. 

if  point_not_found  Add  point  to  list  of  ExternalPoints 
else  Interpolate  solution  to  the  point  location. 

} 

Gather  all  external  points  into  AllExternalPoints. 

For  each  point  in  AllExternalPoints 

{ 

Search  point  on  old  grid, 

if  point_not_found  Solution  =  0.0 

else  Interpolate  solution  to  the  point  location. 

} 

Sxjin  the  Solution  array  across  all  blocks  to  make  it  known 
to  all  blocks . 

Figure  3.6  Algorithm  for  Parallel  Multiblock  Search 


CHAPTER  IV 
RESULTS 


The  algorithm  PMAG  described  in  the  earlier  chapters  has  been  succes- 
fiilly  applied  to  several  test  cases.  This  chapter  enlists  tiie  results  obtained 
and  lists  the  merits  of  this  code  as  compared  to  the  pre-existing  codes.  The 
last  section  points  at  possible  future  work. 

The  ARL  Missile  test  cases 

The  main  motivation  for  generating  this  parallel  adaptation  code  came 
from  the  need  for  developing  a  fast  grid  adaptation  algorithm  for  simulating 
the  ARL  Missile  test  cases  [30].  This  work  was  a  part  of  the  Computational 
Fluid  Dynamics  (CFD)  simulations  performed  in  support  of  the  KTA— 12  pro¬ 
gram.  This  program  was  designed  to  evaluate  computational  technology  for 
predicting  highly  separated  flowfields  for  missile  configurations. 

NPARC3D  flow  solver  was  used  for  computing  the  solution.  This  solver 
is  based  on  the  Beam-Warming  implicit  approximate  factorization  algorithm 
that  solves  the  set  of  equations  produced  by  central-differencing  the  Navier- 
Stokes  equations  on  a  multi-block  structured  grid.  The  code  is  very  well  docu¬ 
mented  and  validated  [181.  The  turbulence  model  used  is  Baldwin-Lomax. 

The  exifitirig  code  by  Hugh  [2]  with  the  CSIP  solver  was  very  slow. 
Table  4.1  shows  the  run  time  for  the  original  algorithm  as  compared  to  the 
time  taken  for  the  current  algorithm.  Figure  4.1  shows  the  comparison  of 
point  clustering  near  the  shock  for  each  algorithm. 
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Algorithm 

#  Blocks 

#  Procs 

Run  lime  (real) 

Hugh’s  Solver 

1 

1 

7:34:58.29  hours 

PMAG 

1 

1 

58:15.64  mins 

1 

PMAG 

2 

2 

32:33.70  mins 

1.8 

PMAG 

6 

6 

13.14.23  mins 

4.4 

Table  4.1  Comparison  of  runtime  for  PMAG  with  Hugh’s  Algorithm 


Hugh’s 

Solution 


PMAG 

Solution 


Figure  4.1  Comparison  of  adaptation  at  shock 
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All  the  runs  in  Table  4.1  were  done  on  a  12  processor  SGI  Power  Onyx 
machine  (CPU:  150Mhz  R4400,  Main  memory:  1.6GB).  For  sequentied  run¬ 
time,  all  the  blocks  were  combined  into  a  single  grid  block  and  the  same  pro¬ 
gram  was  used.  This  result  cannot  be  used  forjudging  the  scalability,  since  the 
single  block  run  does  not  involve  communication  of  boimdary  information  be¬ 
tween  faces,  and  multiblock  solution  interpolation.  The  efficiency  of  0.73  in¬ 
spite  of  the  overheads  is  thus  substantial  in  this  context. 

Parallel  Performance 

Im  most  cases  with  complicated  multiblock  topologies,  the  grids  carmot 
be  combined  into  a  single  block.  In  such  a  case,  if  the  numer  of  processors 
available  is  less  than  the  number  of  blocks,  the  process  have  to  time  share  on 
the  vailable  CPUs.  On  the  other  hand  if  the  number  of  processors  exceeds  the 
number  of  blocks,  each  block  can  spawn  multiple  threads  to  take  advantage  of 
available  resources.  Table  4.2  shows  a  comparison  of  runtimes  on  a  range  of 
processors.  The  speedup  and  efficiency  is  calculated  based  on  the  runtime  for 
the  entire  grid  as  a  single  block  ( Fastest  sequential  runtime  possible ). 


#procs 

Run  Time  (real) 

Speedup 

Efficien(y 

1 

1:12:20:38  (expected) 

0.8063 

2 

40:20.38  mins 

1.4442 

0.7221 

3 

26:30.06  mins 

2.1981 

0.7326 

6 

13:14.23  mins 

4.4017 

0.7336 

9 

11:28.21  mins 

6.0799 

0.6644 

11 

10:07.67  mins 

6.7670 

0.6227 

Table  4.2  Absolute  Speedup  and  Efficency  for  6  blocks 


PMAG  can  also  be  run  over  a  network  of  workstations.  Table  4.3  shows 
ilie  runtime  for  a  cluster  of  SGI  Indigo^  workstations.  Each  workstation  has  a 
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lOOMHz  R4000  CPU  and  128MB  main  memory.  The  super  linear  speedup  is 
because  of  memory  swapping.  (The  grid  is  80MB  and  solution  is  110MB.) 


#  Blocks 

#  machines 

Run  Time  (real) 

1 

1 

4:02:40.24  hours 

6 

6 

31:64:48  mins 

7.6 

Table  4.3  Runtime  over  a  network  of  workstations 


Figure  4.2  Graph  of  Speedup 


Grid  Adaptation  Results 

Grid  adaptation  was  used  for  3  test  cases  in  the  ARL  report  [30]. 
Table  4.4  shows  improvement  in  body  force  calculation  by  grid  adaptation  in 
case  3.  (Mach  2.6,  angle  of  attack  14®,  Pq  =  42”  Hg,  To  =308  K,  Re  =4x10®  /feet) 
Tlie  figures  oh  the  following  pages  show  the  improvement  in  solution 
accuracy  in  terms  of  better  resolved  flow  features  and  improved  body  force 
computations,  obtained  for  test  case  2  (  Mach  1.8,  angle  of  attack  of  14®,  Po  = 
14”,  Hg,  To  =  316  K,  Re  =  2x10®  /feet ) 


Forces 

Experimental 

Data 

Unadapted 
NPARC  (BL) 

Adapted 
NPARC  (BL) 

Axial 

0.1957 

0.3307 

0.3309 

Normal 

1.9100 

1.8855 

1.9052 

Moment 

10.2417 

10.0314 

10.2060 

Table  4.4  Forces  and  Moments  Comparison  for  Ceise  3 


ARL-Missile  Case  2 

Mach  1.^"  Alpha  14 


Tire  4.6  Improvement  in  shock  resolution 


Figure  4.7  Close  up  of  improved  shock 
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5-Block  grid  around  a  Missile  Fin 

Table  4.5  gives  the  comparison  of  runtime  for  generating  an  elliptic 
grid  in  5  blocks  around  a  missile  fin.  The  results  show  that  a  near  linear  (abso¬ 
lute  effieciency  0.96)  speedup  is  achieved  by  the  parallel  algorithm.  The  speed¬ 
up  in  this  case  looks  much  better  than  the  ARL-Missile  cases,  since  this  exam¬ 
ple  does  not  involve  grid  adaptation.  NURBS  evaluations  are  not  done  since 
surfaces  are  treated  as  dirichlet  boimdaries. 


Algorithm 

#  procs 

Run  Time  (real) 

Speedup 

Sequential  (GUMB) 

1 

2:36:16  horns 

1 

PMAG 

7 

23:56  mins 

6.72 

Table  4.5  Comparison  of  runtime  for  GUMB  v/s  PMAG 


Figure  4.8  Smoothened  grid  around  the  fin 


Figure  4.9  5— Block  grid  around  Missile  Fin 
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General  3-D  geometry 

This  example  demonstrates  PMAG’s  capability  to  generate  smooth  el¬ 
liptic  grids  around  complicated  topologies.  The  grid  is  a  single  block  0-grid.  K 
minimum  and  K-maximum  faces  coincide  with  each  other.  The  0-grid  is  sim¬ 
ulated  by  specifying  the  K  faces  to  be  shared,  with  the  next  block  as  the  cur¬ 
rent  block  itself.  The  block  then  sends  the  face  to  itself  and  puts  it  in  the  ghost 
cells  in  the  appropriate  place.  When  a  block  communicates  with  itself,  PMAG 
assigns  unique  tags  to  the  faces,  so  that  messages  do  not  get  mixed  up. 

Figure  4.13  shows  the  critical  region  for  this  geometry.  An  elliptic  grid 
would  usually  crossover  at  the  comer.  The  grid  shown  was  obtained  by  doing 
one  smoothing  iteration  on  the  geometric  control  functions  evaluated  over  the 
entire  grid.  (  No  smootiiening  is  done  for  the  control  function  affecting  the 
packing. )  The  grid  is  allowed  to  converge,  and  geometric  functions  are  recom¬ 
puted  after  a  few  iterations  ( in  this  case  20 ).  The  final  grid  was  obtained  in  35 
iterations. 


Figure  4.10  The  O  type  grid 
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Figure  4.11  3D  geometry  with  grid 
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Figure  4.12  Side  view  of  entire  geometay 


Figure  4.13  Zoomed  view  of  critical  section 


Bronchial  Tabes 
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The  following  example  shows  multiblock  grids  used  in  biomedical  ap¬ 
plications.  The  grids  represent  bronchial  tubes.  PMAG  was  used  to  smoothen 
the  grid  inside  the  blocks.  The  resxilts  show  better  slope  continuity  at  block 
interface,  and  improved  grid  quality. 


Figure  4.14  Bronchial  T\ibes 


Titan  Nozzle 
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This  is  a  4  block  2-D  geometry  of  a  convergent-divergent  nozzle  for  the 
titan  rocket.  SAGE  was  being  used  to  adapt  the  grid  to  this  case.  However,  the 
grids  obtained  using  SAGE  tend  to  lose  the  boundary  layer  packing  and  give 
highly  skewed  grids  at  the  boundaries.  Since  PMAG  is  a  3-D  code,  and  re¬ 
quires  atleast  3  plemes  in  the  k  direction  a  3-D  grid  was  created  by  simply 
stacking  the  2-D  planes  in  the  k  direction.  The  resulting  grid  shows  excellent 
shock  capturing  and  a  fairly  orthogonal  grid  in  all  regions.  The  geometric 
functions  and  Neumann  boimdaiy  conditions  implemented  in  PMAG  succes- 
fully  produce  grids  that  retain  the  boundary  layer  packing  as  well  as  ortho¬ 
gonality. 


Conclusions 


60 


A  parallel  grid  adaptation  algorithm  for  a  general  structured  multi¬ 
block  domain  (PMAG)  has  been  developed.  The  algorithm  has  been  demon¬ 
strated  to  handle  a  range  of  topologies,  like  simple  multiblock  structimes,  com¬ 
plex  structures  including  multiple  blocks  sharing  a  face,  faces  with  fixed  as 
well  as  movable  patches  and  0-grids. 

Strategies  used  for  interprocess  communication  of  shared  faces  have 
proved  to  work  very  well.  No  global  connectivity  tables  are  maintained  even 
for  general  multiblock  topologies.  Complete  continuity  of  grid  lines  is  guaran¬ 
teed  across  shared  faces. 

Processes  can  communicate  grid  point  information  and  solution  in¬ 
formation  after  any  specified  number  of  iterations  or  after  a  specific  change  in 
the  global  norm.  This  can  save  on  communication  overheads,  if  the  grids  do 
not  change  significantly  at  every  iteration. 

Each  process  operates  on  one  block.  Processes  exchange  messages  using 
the  Message  Passing  Interface  OMPI).  If  the  number  of  processors  available  is 
larger  than  the  number  of  blocks,  PMAG  can  spawn  threads  to  make  maxi¬ 
mum  use  of  the  available  resources  (Only  on  SGI  shared  memory  platforms). 
^Tlie  speedup  achieved  has  been  shown  to  be  significant.  However,  the  efficien¬ 
cy  drops  since  the  NUEBS  and  solution  interpolation  routines  do  not  use  mul¬ 
tiple  threads.  In  case  the  number  of  processors  is  less  than  the  number  of 
blocks,  the  processes  time-share  on  the  CPUs.  This  has  been  shown  not  to  be 
too  inefficient  as  compared  to  a  single  block  i*un. 

A  weight  fimction  generated  with  a  boolean  sum  of  scaled  first  and  se¬ 
cond  order  derivatives  of  all  solution  variables  has  been  used.  This  fimction 
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has  shown  success  in  identifying  solution  features  of  widely  vaiying  intensi¬ 
ties  like  shocks,  boundary  layers,  separated  regions,  and  vortices. 

Generation  of  NURBS  surfaces  using  inverse  NURBS  formulation  al¬ 
lows  boundary  point  movement  without  loss  of  geometric  fidelity  This  has 
helped  in  generating  excellent  orthogonal  adapted  grids. 

A  multiblock  solution  interpolation  algorithm  has  been  designed.  The 
algorithm  has  been  tested  to  give  very  good  results.  Solution  interpolation 
guarantees  grid  adaptation  in  accurate  regions. 

The  algorithm  has  also  been  shown  to  be  a  usefiil  and  flexible  tool  in 
generating  smooth  elliptic  grids.  The  features  provided  give  the  user  the  abil¬ 
ity  to  create  structured  elliptic  grids  around  very  complicated  geometries. 

This  algorithm  was  designed  with  the  primary  objective  of  speeding  up 
the  grid  adaptation  process.  This  objective  has  been  met  with  great  success. 

Future  Work 

The  areas  for  further  research  can  be  classified  as: 

Grid  Generation  Issues 

•  Weight  functions  depending  on  orthogonality  and  skewness 

•  Allow  user  to  define  variables  to  be  used  for  calculating  weights. 

•  Extension  to  hybrid  grids. 

•  Integrate  with  a  flow  solver. 

Parallel  Computing  and  Performance  Issues 

•  Improve  overlap  of  communication  and  computation. 

•  Multiblock  solution  interpolation  using  SHIFT. 

•  Multi-threaded  implementation  of  NURBS  routines. 

•  Implement  threads  on  SUN. 
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•  Automatic  load  balancing  using  threads. 

General  Improvements 

•  Make  the  code  fool  proof  with  extensive  error  checking  on  inputs. 


APPENDIX  A 

TRANSFORMATIONS  AND  TRIDIAGONAL  FORMULATION 
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Transformation  Relations 
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Grid  generation  involves  transformation  from  the  computational  space 
I*  to  the  physical  coordinates  xK  The  following  definitions  help  in  expressing 
the  metric  terms  and  elliptic  equations  in  a  concise  form. 


Basic  Definitions: 


a 


dXy  dxo  dx^^  . 


1, 2, 3(covariant) 


af  s  j  =  l,2,3(contravariant) 

12  3 


Sij  —i  ’  QlJ  Sji  ^  2, 3 

j=  1,2,3 

giJ  =  a^  .  ^  =  gii  1,2,3 

;=  1,2,3 

g  =  detl  gij  I  =  [  •  (  ^2  X  as)  f 


Important  Formulae: 

3 

2!  ^  ^  ^  ^  i,j,k  cyclic 

|S=1 


(S(,)p  =  +  Cp-rf?. 


*  i^‘ 


2 
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Formulation  of  the  Tridiagonal  System 
Consider  the  three  dimensional  elliptic  grid  generation  system  : 
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3  3 


i=i;=i  jt=i 


Tb  solve  the  above  equation  niunerically.  The  derivatives  at  all  points 
inside  the  field  are  evaluated  using  central  difference.  For  points  on  boundary 
one  sided  second  order  differencing  is  used. 

Central  Difference : 


„  _  ^/+l  -  ^1-1 

'^5 - 2 


'■«  =  '■(+1  -  2ri  +  r,_^ 


’■fn  =  -  ’’i+uj-i  +  '■.-i,/-! 

Forward  difference  : 


= 


-  3  r,.  +  4  r,.+i  -  r,.+2 


Backward  difference : 


_  _  3  0-4  r,_,  +  o_2 

- 2 

The  equation  is  then  solved  using  a  tridiagonal  solver.  If  ri  (i-l,n)  be 
the  points  along  an  i  line,  the  entire  line  can  be  evaluated  by  casting  the  equa¬ 
tions  as  a  set  of  linear  equations  as  follows: 


67 


'  d\  d\  0  o' 

dl  dl  dl  0 

'’l 

4 

4 

•••  •••  •••  ••• 

0  4-1  4-1  4-1 

0  0  dj,  dl 

Im  ^ 

•  •• 

^n-1 

I'r, 

n 

4-1 

K 

n 

L  ^ 

where, 


dj  =  (  1  -  0.5  Pi  ) 

df  =  -2  (  ^11+^22+^33) 

df  =  1  +  0.5  Pi  ) 

Ri  =  -  2  (  +  ^23  +  g3i  ) 

-  (  '•^+1,*  +  ^ij-u  +  ^2  ) 


This  system  of  Hnear  equations  is  solved  for  each  i  line. 


APPENDIX  B 
USERS  GUIDE 
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Introduction 
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PMAG  is  a  3-dimensional  Parallel  Multiblock  Solution-Adaptive  Grid 
generator  based  on  solution  of  elliptic  partial  differential  equations.  It  can  also 
be  used  for  generating  smooth  orthogonal  grids  on  complex  multiblock  do¬ 
mains.  The  following  sections  guide  the  user  through  the  steps  required  for 
using  this  code.  The  user  is  assiuned  to  be  familiar  with  basic  grid  generation 
and  the  numerical  flow  simulation  process. 

Overview  and  Terminology 

PMAG  is  a  structured  grid  adaptation  algorithm.  Adaptation  is  done  by 
redistribution  of  grid  points.  The  total  number  of  grid  points  remains  the 
same. 

A  block  has  6  faces,  12  edges  and  8  vertices.  There  is  no  restriction  on 
block  —  block  connectivity.  One  block  can  be  share  its  faces,  edges  or  vertices 
with  any  number  of  blocks.  The  nmnbering  of  blocks  is  not  important.  Any 
block  can  be  linked  to  any  other  block  including  itself.  The  part  of  the  block 
face  that  is  shared  with  another  block/  held  fixed  /  defined  as  a  NURBS  siu*- 
face  is  called  a  patch.  Each  face  can  have  MAXPATCH  number  of  patches. 
Shared  patches  are  fi'ee  to  move  in  space.  Fioced  Patches  are  held  fixed 
throughout  the  adaptation  process.  These  points  do  not  move.  NURBS  patches 
are  defined  on  the  surfaces  where  the  geometric  fidelity  needs  to  be  conserved. 
The  grid  points  on  these  siufaces  are  moved  such  that  they  do  not  distort  the 
original  geometiy.  Block  numbers  start  with  0. 

Each  block  is  solved  in  an  individual  process.  The  process  communicate 
with  each  other  using  MPI.  The  processes  can  be  on  the  same  machine  or  on 
different  machines.  MPI  is  also  supposed  to  support  a  heterogenous  network. 
(However,  the  tests  so  far  have  not  demonstrated  this  feature).  Machine 
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names  and  executable  file  names  for  eoAih  block  are  specified  in  the  PMAG.pg 
process  group  file. 

On  SGI  machines,  each  process  can  spawn  light  weight  processes  called 
threads.  The  process  assigns  a  part  of  its  block  to  each  of  its  threads.  Each 
thread  can  run  on  an  individual  CPU,  thus  taking  advantage  of  all  the  avail¬ 
able  processing  power.  The  user  has  the  option  to  specify  the  CPU  on  which 
each  process  and  its  associated  threads  are  run.  ( See  local  input  files ) 

The  Basic  Steps 

These  are  the  basic  steps  to  generating/adapting  grids 

•  'E.mPARAM.INC  and  compile  PMAG. 

•  Put  each  block  of  the  grid  into  files  block.O,  block.1 ..  block.n 

•  If  adapting,  each  solution  block  must  be  in  soln.O,  soln.l ..  soln.n 

•  Create  local  information  files  info.O,  info.l ..  info.n  for  each  block. 

•  Create  a  global  information  file  info.global 

•  Create  the  process  group  file  PMAG.pg 

•  Run  PMAG. 

The  output  files  are: 

gridout.0,  gridout.l ...  gridout.n  :  Ascii  Grid  Blocks  in  PlotSd  format. 
soln.O,  soln.l ...  soln.n  :  Ascii  Solution  Blocks  in  PlotSd  format. 

The  Include  File  “PARAM.BSTC” 

PMAG  is  almost  completely  in  FORTRAN.  The  PARAM.INC  file  con¬ 
tains  the  maximum  array  sizes  and  related  parameters  fiiat  need  to  be  set  be¬ 
fore  compiling. 

nMAX:  Maximum  number  of  Iterations  to  run. 

MAXi:  Set  these  to  atleast  2  values  larger  than  the  maximum 

MAXj :  dimension  in  the  respective  direction. 

MAXk:  This  is  to  accomodate  the  ghostfaces  for  face  exchange. 
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MAXd;  Set  to  the  largest  of  MAXi,  MAXj,  MAXk 

MAXBLOCKS:  The  maximum  number  of  blocks  in  the  grid. 

MAXPATCH:  Maximum  number  of  Patches  on  the  block  faces. 

MAXCPUS:  Maximum  number  of  CPUs  on  the  machine. 

MAXOUT:  Maximum  number  of  points  that  may  move  outside  a 

blocks  domain. 

NOSLOPE;  =  1  boundary  point  movement  criteria  is  orthogonality. 

=  0  criteria  is  slope  continuity  for  lines  near  surface. 

The  user  is  advised  not  to  change  any  of  the  other  defined  parameters. 

The  Input  files 

Solution  and  grid  is  expected  in  PLOT-3D  formatted  multiblock  format. 
PMAG  expects  an  input  of  a  3— D  multiblock  grid  in  multiple  fQes.  Each  file 
contains  one  block.  (This  means  that  each  file  will  have  NZONES  =  1.  This  is 
required  in  the  view  of  future  compatibility).  The  blocks  should  be  in  files 
block.O  to  block.n  where  n  is  the  number  of  blocks  in  the  grid.  The  correspond¬ 
ing  solutions  should  be  in  soln.O  to  soln.n. 

There  are  three  main  input  files. 

info.##  :  These  are  local  input  files.  ##  corresponds  to  the  block 

number  to  which  this  information  applies. 

info.global:  This  is  the  global  information  file.  This  information  is 
read  only  by  the  1st  block  and  then  broadcast  to  all. 

PMAG.pg:  This  file  is  required  by  MPI  to  start  up  processes  for  each 
block. 

Local  input  files;  info.## 

The  local  input  file  allows  the  user  to  specify  the  shared,  fixed  and 
NURBS  patches  on  the  block.  It  also  allows  the  user  the  specify  if  the  block 
has  to  be  run  on  multiple  threads  (  SGI  only ).  The  user  can  also  specify  a  list 
of  CPU’s  on  whidi  the  process  and  its  threads  are  run.  If  Number  of  Threads  is 
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specified  as  0,  the  process  spawns  the  default  number  of  threads  specified  in 
the  global  information  file.  Specify  CPU’s  =  0  leaves  the  scheduling  to  the  op¬ 
erating  system. 

3 
1 

0  13  4 

0 

:  3 

1,1,29,1,64 
2,41,71,1,64 
3,1,70,62,97 

0 
0 
0 
0 
1 
1 

1,1,70,1,97 

1 

0,30,40,1,63 

0 
0 
0 
0 

,,  NOTE:  Faces  must  be  specified  in  cyclic  order,  for  example 
a  J=constant  face  is  specified  as  kmin,kmax,  imin,  imax 

Sample  Local  Information  File 

A  NURBS  patch  is  specified  in  the  section  for  fixed  patches,  since  the 
surface  geometry  is  held  fixed.  The  boundary  points  however  are  allowed  to 
move  on  the  surface.  Fixed  patches  are  also  specified  with  6  numbers.  The 
first  integer  specifies  whether  the  patch  is  fixed  (0)  or  a  NURBS  (1). 


Nxamber  of  THREADS 
Specify  CPUs  ?  (1/0) 

List  of  CPUs  if  line  2=1 
Imin  has  0  shared  patches. 
Imax  2  has  3  shared  patches 
block, jmin, jmax, kmin, kmax 

Jmin  has  0  patches 
Jmax  has  0  patches 
Kmin  has  0  patches 
Kmax  has  0  patches 
Any  Fixed  Patches?  (1/0) 

0  Fixed  Patches  on  Imin 
NURBS  ? ,  j  min ,  jm^ ,  kmin ,  kmax 
1  Fixed  Patch  on  Imax 
NURBS? ,  jmin,  jmax,  kmin,  kmax 
0  Fixed  Patches  on  Jmin 
0  Fixed  Patches  on  Jmax 
0  Fixed  Patches  on  Kmin 
0  Fixed  Patches  on  Kmax 
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By  default  each  boundary  face  is  treated  as  a  fixed  Dirichlet  boundary 
condition.  A  shared  patch  is  defined  by  5  numbers.  Neighbouring  Block,  mini¬ 
mum  and  maximum  index  in  I  direction,  minimum  and  maximum  index  in  J 
direction.  The  indices  must  be  in  a  cyclic  order.  That  is  for  a  J  Face,  the  patch 
will  be  specified  as  :  NbrBlock,  Kmin,  Kmax,  Imin,  Imax. 

Global  input  file:  info.global 

Norm  of  change  in  grid  point  locations  is  calculated  globally  at  the  end 
of  every  iteration.  The  first  parameter  defines  the  convergence  criteria  for  the 
grid.  The  program  stops  if  the  norm  is  not  reached  within  the  Max  Number  of 
iterations  (2nd  line). 

If  the  original  grid  is  extremely  distorted  and  has  crossovers  etc,  its  best 
to  start  the  smoothening  process  with  a  few  laplace  iterations  to  get  rid  of  the 
crossovers. 

A  0  for  RUNTHREADS  parameter  in  the  local  input  file  spawns  the  de¬ 
fault  number  of  threads  specified  in  this  file. 

Geometric  Interpolation  fimctions  can  either  be  interpolated  fi*om  the 
boundaries  (  1 )  or  calculated  over  the  entire  grid  ( 0 ) 

If  the  starting  grid  consists  of  faces  only,  then  the  algorithm  can  gener¬ 
ate  a  starting  guess  Linear  Transfinite  Interpolated  grid  in  the  volume. 

If  Adaptation  =  0,  tiie  algorithm  will  not  read  a  solution  file,  and  just 
run  as  a  elliptic  grid  smoothener. 

Sharp  changes  in  weight  function  can  lead  to  sharp  adaptive  control 
fimctions.  Hiese  can  lead  to  g;rid  crossovers.  Its  hence  best  to  smoothen  the 
adaptive  control  fimctions  by  simple  averaging.  Each  iteration  extends  the 
field  of  influence  of  the  control  by  one  grid  point  in  all  directions. 
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Maximum  norm  of  change  to  stop  iterations 
1 . OD-7 

Max  nxamber  of  iterations  (should  be  <=ITMAX  in  PARAM.inC) 
35 

Number  of  Laplace  Iteration  at  the  beginning. 

0 

Default  Number  of  threads  on  large  blocks 
1 

Interpolate  geometric  control  functions?  (1/0) 

0 

Generate  Transfinite  interpolated  grid  from  faces?  (1/0) 

0 

Adaptation  (1/0) 

0 

Smoothening  iterations  on  the  Adaptive  control  functions 
2 

Use  boolean  weights? (1/0) 

1 

A  small  value  epsilon  for  normalizing  derivatives 

0.00001 

Multiplication  factors  for  Geometric  Control . P,Q,  S 

1.5  3.5  0.0 

Multiplication  factors  for  Adaptive  Control  P,Q,S 

3.5  3.5  0.0 

NURBS  patches  update  after  how  many  iterations  ? 

6 

Redistribute  solution  on  new  grid?  (1/0) 

0 

Redistribute  after  every  #  iterations (  If  Redistribute=l) 
10 

Sample  Global  Input  File 

Boolean  Weights  =  1  uses  a  weight  hmction  obtained  by  a  boolean  sum 
of  weights  in  each  direction.  Flse  adaptive  control  functions  are  calculated  in 
each  direction  only  with  respect  to  the  weight  function  in  (hat  direction. 
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Epsilon  serves  as  a  small  number  to  avoid  a  division  by  zero.  It  also  acts 
as  a  filter  for  the  flow  features  resolved. 

The  multiphers  magnify  the  control  fimctions  by  the  specified  amount. 
This  way  the  user  can  change  the  effectiveness  of  each  of  the  factors  control- 
hng  the  grids. 

NURBS  patches  can  be  updated  every  few  iterations.  Updating  the 
patches  every  5-6  iterations  is  usually  good  enough,  but  it  depends  on  applica¬ 
tion  and  how  much  the  grid  changes  with  every  iteration. 

The  last  parameter  specifies  the  number  of  iterations  between  solution 
interpolation.  Again  3-5  iterations  is  usually  a  good  number,  but  the  require¬ 
ments  can  vary  for  each  case. 

The  process  group  file:  PMAG.pg 

This  file  hsts  the  machines  where  each  process  will  be  created. 

MPI  spawns  the  process  by  running  rshell. 

local  0  /tmp/manoj/PMAG 

machinel  2  /usr/tmp/mano j /PMAG 

machine2  1  /tmp/PMAG 

machine2  1  /tmp/PMAG 

Sample  Process  Group  File 

Except  for  the  first  line,  each  line  consists  of 
Machine-name  Number_of_j)rocesses  Full _path_to_executable. 

More  than  one  processes  can  be  spawned  on  one  machine  in  either  of  the  ways 
as  shown  in  tiie  example  file. 

The  first  line  consists  of 
local  0  Full_path_to_executable. 

The  first  line  spawns  the  root  process  on  the  local  machine. 
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The  number  of  processes  specified  in  this  file  MUST  be  equal  to  the  total 
number  of  blocks!  For  more  information  on  the  process  group  file  refer  to  the 
MPI  reference  manual.  [27] 
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APPENDIX 


A.1  TIGER  SYSTEM 


The  development  of  TIGER  has  evolved  from  its  origind  grid  generationpurpose  into  a  comprehen¬ 
sive  simulation  system  that  contains  six  modules:  (i)  grid  generation  modme,  (ii)  visualization  mod¬ 
ules,  ^i)  network  module,  (iv)  flow  solution  module,  (v)  simulation  module,  and  (vi)  toolbox  mod¬ 
ule.  Each  may  be  acoessea  independently  to  perform  different  tasks.  These  modules  are  linked 
together  with  a  common  graphical  user  interface  (GUI)  (Figure  1). 


1  in  graphics  workstations  have  provided  the  hardware  capability  for  the  execution  of  sophis- 
ser— friendly  and-  interactive  gnd  generation  and  sdentiiic  visudization  softwares.  A  ming 


Advances  i 

ticated,  user— tnendiy  and- interactive  gnd  generation  and  scienaiic  visuauzaaon  sottwares.  A  rising 
awareness  on  the  issues  such  as  data  structures  and  grid  generation  d|orithms  ptompts  this  develop¬ 
ment  which  has  dramaticdly  reduced  the  man-hours  involved  dunng  the  grid  generation  proews 
for  many  engineering  applications.  Computer  softwares  such  as  GENIE++  (2J,  EAGLE\riew[2j 
and  NGP(5)  arc  examples  of  such  developments.  Basic  grid  generation  techniques,  such  as  tiansfi- 
nite  interpolation  (TFI),  weighted  TFI  (WTFI),  elliptic  grid  generation  ^sterns,  etc.,  arc  refined 
during  the  development  of  these  codes.  However,  for  aoplications  involving  complex  turbomachin- 
cry  flow  fields,  the  grid  generation 


Rgurc  1  Graphical  User  Imcrfacc  of  TIGER 
process  is  extreinelj*  time  consuiuing.  Consequently,  customized  grid  generation  codes  for  turbomaclun- 
ery  configurations  have  l)ce.n  developed  to  address  tliis  probleml3.4,(>l. 


In  TIGER,  the  grid  generation  step  is  accomplished  by  the  combination  of  automatic  and  interactive  cus¬ 
tomized  algoritlims  including  computational  domain  transformation,  geometry  surface  construction, 
sub-domain  frame  setup,  and  local  grid  manipulation.  Hie  domain  transformation  is  automatically  per¬ 
formed  by  mapping  tlie  physical  domain  into  computational  domain  once  the  user  specifies  the  indices  for 
the  blade  and  block  boundaries.  Tliis  index  information  is  also  used  to  perfonn  automatic  sub-blocking 
during  tlie  later  grid  generation  process.  An  interactive  procedure  is  designed  to  allow  the  user  to  ma¬ 
nipulate  critical  q-constant  curves,  sucli  as  tlie  grid  line  tliat  run  tiirough  tlie  rotor  tip  or  the  sliroud, 
where  q  is  tlie  curvilinear  coordinate  in  radial  direction.  Tlie  major  component  of  a  turbomacliine,  the 
blade,  is  constructed  automatically  by  performing  intersection  and  surface  spline-fit  with  non-unifonn 
rational  B-spline  (NURBS)  algoriUim  13,7,8,9]. 


NURBS  has  drawn  a  lot  of  attention  in  tlie  areas  of  geometry  modeling,  computer  grapliics,  and  grid 
generation  because  of  its  powerful  geometry  properties  such  as  the  convex  hull.  local  control,  variation 
diminisliing  and  affine  invariance.  It  also  has  useful  geometry  tools  sucli  as  knot  insertion,  degree  eleva¬ 
tion,  and  splittingI7,8,9]. 

A  NURBS  curve  of  m+1  control  points  can  be  defined  by  an  order  k,  a  set  of  control  points  {bi,  i-0,..,tn),  a 
set  of  knots  {ui,  and  a  set  of  weights  (ict,  i=0,..,m.\.  The  following  equations  represent  a 

NURBS  curve: 

/,,  N\iu) 


v(u)  = 


(i) 


where  the  basis  functions  are  defined  as 


N^u) 


(it  -  K/W*-Kc)  (u,^t  -  u)ffl;l(ti) 

ttn-t-t  ~  ~~  Wj+I 


N}(u)  e  1  if  Ui  ^  u  <  «j+i 

=  0  /  =  0.  1.  — ,  m  (2) 


A  NUEBS  surface  with  (fn+l)x(n+l)  control  points  can  be  expressed  by  two  orders  ku  and  kg,  a  set  of 
control  points  {6^,  i=0,..,m,j=0,.„n},  a  set  of  knots  l(ui),  iH),..,m+ku,  (vj),J^,...n+ko],  and  a  set  of  wei^ts 
iioij,  i=0,..,m,J=0,..,n).  llxe  following  equation  represents  the  NUEBS  surface: 

-  (3) 

/«o  y-0 

where  the  basis  functions  (N^  ,  A^)  are  defined  in  a  siinilar  fashion  as  Equation  (2).  The  De  Boor 

algoiithmllO]  can  be  used  to  evaluate  the  NURBS  curves  and  the  NUEBS  surfaces  with  known  evalua¬ 
tion  ordeifs),  control  points,  weights,  knot  sequences,  and  hence  the  basis  functions.  ' 


The  entire  domain  is  represented  in  a  finmework  by  connecting  each  critical  index  location,  such  as  the 
blade  leading  edge,  trailing  edge,  and  hub  nose,  etc..  Both  meridional  view  and  cascade  view  are  avail¬ 
able  to  a  user  for  spedfting  the  distribution  informations,  and  allow  the  manipulations  to  these  firames. 
An  example  for  the  cascade  firame,  meridional  finme,  and  the  corresponding  surface  grid  of  a  high  bypass 
ratio  fan  stage  are  shown  in  Figure  2. 


The  meridional  frame  represents  the  entire  domain  on  (a;,r)  plane,  where  (fcjr,6)  are  the  cylindrical  coor¬ 
dinates  used  in  TTGEE.  The  cascade  frame  represents  the  blade-4o-blade  surface  in  parametric  coordi¬ 
nates  (m-VX  The  transformation  relationship  between  Oc,r,0)  and  (m'/r)  can  be  expressed  as  follows 
(5,11]:  for  a  generative  g=g(i(u),r(u)) 

[  'g(i(u)M  du 

Ofj  =  r0y 


where 


r  «=  +  1) 

nfg  =  0.0 

i  *=  Q...,ni 
j  =  0....n, 


Each  point  p(nj,i\fAj)  on  the  associated  surface  of  revolution  is  compared  with  the  associated  m‘-x  or 
m'-r  map  to  obtain  the  proper  m'-coordinate.  Example  maps  for  an  industrial  ventilation  fan  presented 


in  Figure  8  are  shown  in  Figure  4.  However;  if  a  hub  generatrix  is  not  a  sin^e-valued  function,  such  as 
the  example  shown  in  Figure  4(a),  there  is  a  risk  of  losing  the  one-to-one  correspondence.  Ihereforo,  a 
strategy  is  taken  to  safeguard  this  critical  one-to-one  correspondence  by  decomposing  the  generatrix 
into  various  sections  based  on  the  slope  information.  Arrows  in  Figure  4  represent  the  section  break 
positions.  For  a  generatrix  section,  if  the  slope  |s|  <  1.0,  ni'-x  map  is  used,  otherwise  m'-r  map  is  uti¬ 
lized.,  ‘Iherefore,  for  each  point  is  obtained  by  comparing  with  either  m*-x  map  or  m'-r 

map.  *1116  associated  transformed  cascade  fiame  and  the  corresponding  grid  are  shown  in  Figure  6. 


Figures  Numerical  Modd  of  an  Industtial  Figured  Hub  Owme^  and  the^sodated 

Ventilation  Fan  MapsforanIndustnalFan 


(b)  Resulting  3D  Surface 


Figures  C^cade  Surface  Mapping  (Hub) 


The  cascade  surface  mapping  scheme  developed  in  this  work  is  designed  to  spline-fit  each  point  fiom 
(nt'/r)  space  to  desired  surface  of  revolution  point  by  point.  It  is  therefore  possible  to  allow  surface  grid 
generation  with  difierent  topology. 


For  a  blade  with  low  stagger  an^e  and  large  blade  leading  edge  (LB)  drde  radius,  it  is  a  very  difiELcult 
region  to  maintain  non-overiappirg  grid  with  algdiraic  approach  near  the  leading  edge.  Ebcamples  of 
this  problem  are  demonstrated  in  Figure  7(a).  A  common  practice  to  resolve  this  problem  is  to  apply 
elliptic-  system  with  proi}er  control  functions,  such  as  the  Thomas— MiddleoofF  type  control  functionsCll]. 
Results  of  elliptic  smoothing  with  Thomas-Middlecoff-type  control  function  are  presented  in  Figure  6(b). 
In  this  figure,  it  is  clearly  demonstrated  that  the  dliptic  smoothing  eliminates  the  over-lapping  grid  in 


the  problem  region.  However,  as  demonstrated  in  the  close-up  regions,  the  grid  spacing  is  being  pulled 
off  the  concave  region  and  being  pushed  closer  to  the  convex  boundary,  creating  a  sudden  change  in  grid 
spacing,  as  shown  in  Figure  6(b).  It  is  induced  by  performing  surface  elliptic  smoothing  patch  by  patch 

In  order  to  improve  the  grid  quality,  one  more  step  is  taken  in  the  algorithm  of  HGER'S-grid  generation 
module.  That  is,  the  surface  with  elliptic  smoothing  is  spline-fitted  with  NURBS  surface  formulation. 
Eaiiptic  smoothing  is  necessary  because  the  NURBS  formulation  requires  a  non-overlapping  infHfl] 
to  begin  with.  As  demonstrated  in  Figure  6(c),  the  grid  quality  has  been  improved.  It  is  shown  in  Figure 
8  that  for  this  case  the  orthogonality  near  the  blade  surface  is  improved  as  wdl. 

‘Ihe  grid  generation  module  of  the  TIGER  system  is  one  of  the  most  significant  features  that  reduces  the 
labor  requirements  for  CFD  applications  associated  with  axisymmetric  flow  fields.  This  module  has  been 
used  to  construct  structured  grids  for  axial,  radial,  and  mixed-flow  axisymmetric  configurations.  Exam¬ 
ples  for  internal  flow  field  applications  are  the  high  Reynolds  number  pump  as  shown  in  Figure  7,  the 
NASA  low  speed  centrifugal  compressor  as  presented  in  Figure  8,  and  missile  configuration  as  an  exam¬ 
ple  for  an  external  flow  field  application,  as  shown  in  Figure  9. 


Figure  6c  Elliptic  +  NURBS 


Figure?  High  Reynolds  Number  Pump  Figures  Low  Speed  Ceatrifugal  Compressor 


Figure  9  Missile  Configuration 


